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This paper addresses how to calculate and interpret the time-delayed mutual information for a 
complex, diversely and sparsely measured, possibly non-stationary population of time-series of un- 
known composition and origin. The primary vehicle used for this analysis is a comparison between 
the time-delayed mutual information averaged over the population and the time-delayed mutual 
information of an aggregated population (here aggregation implies the population is conjoined be- 
fore any statistical estimates are implemented). Through the use of information theoretic tools, a 
sequence of practically implementable calculations are detailed that allow for the average and ag- 
gregate time-delayed mutual information to be interpreted. Moreover, these calculations can be also 
be used to understand the degree of homo- or heterogeneity present in the population. To demon- 
strate that the proposed methods can be used in nearly any situation, the methods are applied and 
demonstrated on the time series of glucose measurements from two different subpopulations of indi- 
viduals from the Columbia University Medical Center electronic health record repository, revealing 
a picture of the composition of the population as well as physiological features. 
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In this paper we show how to apply time- 
delayed mutual information to a sparse, ir- 
regularly measured, complicated population of 
time- dependent data. At a fundamental level, 
the technical problem is a probability density 
function (PDF) estimation problem; specif- 
ically, one can average PDF estimates or 
one can aggregate the data set before esti- 
mating the PDF. To understand and inter- 
pret these two means of coping with a popu- 
lation of time-series, one must address four 
issues: (i) estimator bias; (ii) normalization, 
or distribution support-based effects; (Hi) de- 
viations from the single source case for av- 
erage and aggregate; and (iv) practical inter- 
pretation. Scientifically, this paper works to 
develop an infrastructure, and demonstrates 
how to use it, by studying the time- dependent 
correlation structure in physiological variables 
of humans — in a population of glucose time- 
series. In the end, we not only provide a prac- 
tically actionable set of information theoretic 
computations that yield insight into the pop- 
ulation composition and the time- dependent 
correlation structure, but we also detail the 
time- dependent correlation structure and the 
degree of homogeneity within a broad popu- 
lation of humans via their glucose measure- 
ments. 
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I. INTRODUCTION 

It is no surprise that aggregating collections of ele- 
ments or data streams can allow for a productive analysis 
and understanding of the individual elements that make 
up the aggregated population. In fact, the aggregation 
of many elements into a measurable population can be 
pivotal in providing a means to study systems where the 
individual elements are difficult, expensive, or dangerous 
to measure. (Note that by aggregation, we mean com- 
bining sets of measurements in such a way that they can 
be treated as a single set of measurements that can be 
analyzed.) That aggregation provides a basis for anal- 
ysis lies in the fact that the application of most statis- 
tical methods, such as statistical averages, probability 
density estimates, and techniques based on such funda- 
mental methods (i.e., information theory, ergodic theory, 
etc.), require large numbers of data points. While some 
fields have gained much from the analysis of aggregated 
populations of elements — such as advances made in the 
physical sciences with the advent of statistical mechanics 
— many fields have not been so fortunate. A primary 
source of difficulty with aggregation in these less fortu- 
nate contexts lies in the fact that fortune or ruin often 
depends on the ability to aggregate measured elements 
such that statistical averages can be taken. Usually this 
means one must have a population of elements whose sta- 
tistical properties being quantified are drawn from the 
same distributions. This requirement presents two inex- 
tricable problems, verifying that a population is homoge- 
neous enough to produce representative statistics when 
aggregated, and determining whether a statistical analy- 
sis technique will yield the same outcome for the average 
over the population and for the aggregated population. 
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With these broad issues in mind, here we focus on ap- 
plying time-delayed mutual information to a population 
in an attempt to understand the time- dependent nonlin- 
ear correlation between measurements, or the degree of 
predictability of measurements for members of a popula- 
tion. We wish to apply this, however, to a system whose 
members may: (i) have differing numbers of measure- 
ments; (ii) have too few measurements for probability 
densities (or any other statistical quantities) to be es- 
timated; (iii) be non-stationary; (iv) have very diverse 
underlying probability distributions or statistical states; 
and (v) may be measured in a highly irregular manner in 
time. In short, this paper details how to apply and in- 
terpret information theoretic analysis to a diversely mea- 
sured, possibly statistically diverse population that needs 
to be aggregated for the information theoretic quantities 
to be calculable. Thus, this paper complements and con- 
trasts with the research such as is presented in Ref. |I] 
where dynamical reconstruction of a uniformly measured 
stationary systems with short time-series are the focus. 
The particular population we focus on in this paper is a 
subpopulation of human beings who received care at the 
Columbia University Medical Center (CUMC). The par- 
ticular time-series we are focusing on are clinical chem- 
istry measurements (measurements such as glucose, that 
detail physiological functioning of humans) for this pop- 
ulation. Nevertheless, it is important to note that the 
analysis presented is not limited to any particular popu- 
lation of measurements. 



A. A reader's guide: the outline of this paper 

Broadly this paper can be split into two main com- 
ponents. The first component is primarily theoretical 
and includes: a background section (III); a section about 



TDMI-specific estimator bias (IV); a section focusing on 



how the TDMI for a population can deviate from the 
TDMI of an individual stationary source (|V|) ; and finally 
a section explaining how to use the TDMI population cal- 
culations to characterize diversity in a population (VI). 



Second, following the more theoretical sections, are the 
computational sections including: a section explaining 
how to use the TDMI population calculations to charac- 



terize diversity in a population ( VI ) ; a section proposing 



some non-TDMI based metrics for evaluating population 



diversity that help verify the TDMI-based methods ( VII ) ; 



a section summarizing the TDMI methodology explicitly 
( VIII ); and finally the data-based section IX demonstrat- 



ing the methodology. Regardless of intent, readers will 
need the to read the introduction sections Hl lIIII and the 
summary. 



II. MOTIVATING EXAMPLES 

The theory-based motivation for this work is to devise 
a way to calculate and interpret the time-delayed mutual 



information (TDMI) [2] [3] in the context of a popula- 
tion of time-series that are both sampled irregularly and 
are from (possibly) statistically distinct sources. More 
concretely, the motivation for this work comes from the 
desire to understand human health dynamics (i.e., phys- 
iology, complex phcnotype definitions such as diseases, 
basic biology, etc) based on the constrains of real data 
present in the electronic health record (EHR) repository 
at Columbia University Medical Center (CUMC) (note, 
CUMC is affiliated with NewYork-Presbyterian Hospi- 
tal) . These data represents all the information that doc- 
tors at CUMC collect; the CUMC EHR is one of the 
oldest and most complete EHRs in the country, and thus 
represents the type of data that future EHRs will likely 
contain. EHR data are of note because EHRs contain 
most of the macroscopic, biologically based, data on hu- 
mans in existence. 



For instance, the CUMC EHR contains information re- 
garding 2.5 million patients over 20 years and contains 
graphical images, laboratory data, drug data, doctor and 
nurse notes, billing data, and demographic data, most of 
which is highly dependent on time; moreover, the amount 
of data is growing exponentially. Despite the quantity 
of data, EHR data can be difficult to use; in particular, 
EHR data is characterized by: diverse irregular sampling, 
measurements correlated to statistical state, nonstation- 
arity, statistically diverse population, very large popu- 
lations with few measurements, and very diverse data 
types. Nevertheless, if these data prove to be useful for 
understanding human dynamics, a subject that is not 
completely without controversy [J] [S] [5] , it may be pos- 
sible: to define complex diseases and other phenotypes 
(based on real, population scale data); to understand 
how disease and treatment of disease evolve in complex 
and interconnected ways [7] [5j ; to define completeness of 
medical records; correlate drugs to side effects and bene- 
fits; to monitor population-wide disease spread and evo- 
lution; and to carry out many other practical applications 
that can be gained from understanding population-wide 
human health and biology. The approach upon which 
this work is based represents a radical departure from 
the standard utilization of biomedical data; here the data 
are studied using nonlinear physics methodology and has 
been termed by some ^ as the physics of living things. 

Of course, another advantage of motivating the work 
in this paper with a data set with complex properties is 
that it allows for the generalization of the results to many 
other contexts whose data have a subset of the complex- 
ities. Outside of laboratory science, nearly all data sets 
are difficult to control and have many of the same prob- 
lems that EHR data have. Thus, we claim that while 
we apply our analysis in the context of human health 
and physiology, our methods can be easily generalized to 
nearly all time-dependent contexts; e.g., astronomy [lOj . 
geology [TT], climatology [12], and genetics [13]. 



3 



III. INFORMATION THEORY BACKGROUND 

Begin with time-series, X = {xi{ti),--- ,a;jv(iAf)) of 
real numbers. Next, denote all of the pairs of points in X 
separated by a either index time, t = i — j (where i > j 
are the indices of ti and tj respectively), or real time, 
St = ti — tj (again assume ti > tj), by X[t] or X[St] re- 
spectively. Note that r is always an integer while 6t can 
take continuous real values. For this section we will limit 
the discussion to X[t], but note that the X[St] case fol- 
lows identically. Note that in this circumstance, X[t] can 
be used to approximate a joint (two-dimensional) PDF; 
further, note that the marginal distributions of X[t] are 
approximated by X[t](1) = X{i) and X[t]{2) = X{i-T) 
respectively. 

To estimate either the information entropy, or the 
TDMI for this time-series [5| [2], one must first esti- 
mate various probability density functions (PDF) [T^ . 
In order to specify a PDF, one needs to both specify the 
support of the PDF, S, and the PDF itself, p{X). More- 
over, intuitively, the support of the PDF is the interval 
over which the x^'s lie, or, the support of the PDF of X 
is S" = [niin(X), max(X)]. However, when estimating a 
PDF from data, the support will always be collected in a 
series of bins; thus, there also exists an abstract support, 
S, which consists of the explicit bins of the data used to 
estimate the PDF disconnected from the values the bins 
are assigned externally. Thus S does not explicitly rep- 
resent numbers in X; while this may seem like a strange 
point to make, the difference between S and S will be 
critical later in this paper. Finally, note, we will always 
assume that PDFs in this paper have compact support 

m- 

Now, given the random variable X and its associated 
PDF, p{X), the information entropy of a time series gen- 
erated by X is defined by: 



piX)\og{p{X))dx. 



(1) 



Similarly, the TDMI is defined by: 
I{X{i);X{t-r)) = 

IiX[r]) ^ (2) 

piXi^),Xi^ - r)) log f^^^'f^]7^\ dXi^)dXi^ - 
p{X{i))p{X{t-T)) 

Thus the TDMI can be thought of as an auto-information 
measure that depends on a delay (e.g., r or St). 

Given this infrastructure, fundamentally there are two 
ways of conjoining a population: (i) averaging the TDMI 
for each member of the population; and (ii) aggregating 
the population before the PDFs are estimated without 
intermixing the members of the population. As we will 
see, in the context of a heterogeneous population, these 
two approaches will yield both differing numerical results 
and differing interpretations. 

Computationally it is important to note that we will 
employ both a KDE estimator [T5] [T7] [TH] and a stan- 
dard histogram estimator for all PDF calculations. We 



explicitly use the estimator developed in Ref. [TS] with a 
Gaussian kernel and a bandwidth of 100; the histogram 
estimator is of our own design and has a bandwidth of 
20. The results detailed in this paper are relatively in- 
sensitive to these parameter settings (e.g., a 10% change 
in the bandwidth will not produce a qualitatively dif- 
ferent result). Moreover, in this paper we will estimate 
the bias using the fixed point bias estimation technique 
[T^ . which amounts to various random permutations of 
temporal ordering of the time-series used to generate the 
PDFs and will be introduced in more detail in section 
|IVB| Finally, while this paper only addresses the contin- 
uous case, the discrete case follows more or less identi- 
cally with integrals replaced by sums. 



A. Average TDMI 

To formulate the average TDMI for a population, we 
begin by arguing that the average mutual information of 
a vector of individuals (a population) is the same as the 
average of the mutual informations of each individual, if 
the individuals are independent. These cases represent 
conjoining a population after the PDFs have been esti- 
mated; in essence we are just arguing that taking an av- 
erage before or after the TDMI integration is performed 
does not affect the resultant TDMI. 

Assume all processes are stationary. De- 
fine a vector- valued process X, where X{t) = 
[Xi{t),X2{t), - ■ ■ ,Xjv(t)]; this leads to a the following 
definition of multivariate mutual information: 



i[xity,x{t + j)] = 

p{X{t),X{t + j)) 

p{X{t),X{t + j)) 
p{X{t))p{X{t + j)) 



(3) 



log 



dX{t)dX(t + j) 



noting that p(-) is the probability density associated with 
the given random variable, and X{-) and dX{-) are both 
vectors. We want the following statement to be true: 



r) 



-I[X{t);X{t + 3)] 



1 ^ 

-^/[X,(t);X,(t-|-j)] 

i=l 



(4) 



We claim that the sufficient condition for |4] to hold is 
for the Xi processes to be non-interacting, or statisti- 
cally independent. It is important to note that it is not 
necessary that the Xi 's be non-interacting copies of the 
same process — the processes only have to be statisti- 
cally independent. It is not too difficult to verify our 
claim algebraically, one merely applies the chain rule for 
mutual information to Eq. [4j moreover, conceptually un- 
derstanding why our claim is correct is rather straightfor- 
ward. Begin by noting that if the XiS are independent, 
they form an orthogonal set of probability densities, or 
a product measure on A'^-dimensional Euclidean space. 
Thus the integral of each variable will be independent 
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of the others simply because the variables are orthogo- 
nal and thus not functions of one another (c.f., Fubini's 
theorem fIU\). 

The conclusion is that, the average TDMI for the pop- 
ulation is simply the canonically calcuated TDMI for the 
individuals of the population, averaged. 



B. Aggregate TDMI 



To understand the construction where the population 
is aggregated before the PDFs are estimated, assume, as 
we did in section [TlI A[ a stationary, vector- valued process 
X, where X{t) = [Xi{t), X2{t), ...XN{t)], where N de- 
notes the number of individuals in the population. Next, 
assume that each element emits a time-series of length 
rii] without loss of generality, in this section assume that 
rii = n. 

Aggregating the population into a time-series for which 
the PDFs can be estimated can be done in one of two 
ways. The first method involves concatenating the en- 
tire set of time-series into one scalar time-series of length 
Nn and then treating this concatenated time-series like a 
time-series from a single source; denote this aggregation 
method as inter-source aggregation. We will not study 
this as this calculation needlessly adds noise via the in- 
termixing of elements and is hard to rectify with mathe- 
matics. The second method, denoted the intra-source ag- 
gregation because sources are not intermixed within pairs 
of points, involves explicitly collecting pairs of points re- 
stricted to individuals. Specifically, the pairs of points 
are chosen such that the individual pairs of points al- 
ways originate from the same individual, and then these 
sets of pairs of points are conjoined such that the PDFs 
can be estimated. Thus, this method mixes individuals 
by including pairs of points from many individuals, but 
does not mix individuals by pairing points from differing 
individuals. 

To concretely specify what intra-source aggregation 
means, begin with the time series: 



(a;ii,a;i2, ■ ■ ■ , a;i„, X21, • • ■ ,XNn) (5) 



where, given an Xij, i specifies the individual, j speci- 
fies the time, and a time-delay of r for which the TDMI 
is to be calculated. The intra-source pairs that will be 



aggregated and used for estimating the PDF are then: 

{xi,l,Xi^r) 

(-^1,71— T j-^l ,n) 

(x2S,X2,t) (6) 



{X2,n-r,X2,n) 



{XN,n—T:XN,n) 

Thus, denote the left column by and the right col- 

umn by X". Moreover, denote the TDMI calculated be- 
tween these two columns as X"). 

Much of the rest of this paper is dedicated to quanti- 
fying the implications and interpretations for when, and 
conditions under which the average and aggregate TD- 
MIs differ. However, by comparing average to aggregate 
TDMI we will also see that, very often (but not always), 
the aggregate TDMI will form an upper bound on the 
TDMI of an individual. 



IV. TDMI-SPECIFIC ESTIMATOR BIASES 

All statistical estimates have bias associated with 
them. Here we focus on three sources of bias that are 
particular to the estimation of the TDMI for a popu- 
lation: (i) sample-size-dependent estimator bias effects 
for the average versus the aggregate TDMI; (ii) the ba- 
sic methodology we use for numerically estimating the 
bias for the TDMI calculation; and (iii) a source of non- 
estimator bias that is particular to the TDMI aggregation 
case — a sort of filtering bias. 

A. Sample size dependent estimator bias effects 

A practical reason why the order of aggregation mat- 
ters for estimating probability densities lies in the fact 
that most probability density estimation techniques have 
estimator bias that is, to first order, proportional to one 
over the number of points to a power of at least one. 
Thus, because we are interested in coping with popula- 
tions of poorly measured individuals, and because we are 
comparing two methods of conjoining those individuals, 
it is important to understand how the number of data 
points will broadly affect estimator bias in the average 
and aggregate TDMI calculations. 

Begin with a more computationally minded definition 
of the TDMI for a single time-series from a single source 
with n points: 

I[X,{t);X,{t - j)] = IxM) + BE{n) (7) 
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where Ixi{n) is the estimated TDMI for the n pairs of 
points of X and BE{n) is the total estimator bias of the 
calculation with n pairs of points. Note that while ex- 
plicit bias calculations for the entropy and TDMI calcu- 
lations can be found in Refs. 21j, [22^, and (lOJ, it will 
suffice to notice that for most PDF estimators (i.e., for 
kernel density estimators, or histogram style estimators), 
the bias estimates will follow: 

BEin) ^ n-^ (8) 

Nevertheless, it is worth noting that there is also a 
estimator-specific, bandwidth-specific factor on BE{n) 
that is dependent on the proportion of support (e.g., 
number of bins) for which there exist no data points, 
and this factor can be important when n is small (c.f., 
[22] where this effect is carefully quantified for the his- 
togram estimator). To see how the bias of averaging 
TDMI over the population versus the bias of the TDMI 
for the pre-PDF-estimation aggregated populations dif- 
fer, partition the time-series of length n into m pieces, 
where ^ is a positive integer (thus, m divides n evenly 
and n > m). Now, consider the difference between 
I[Xi(t), Xi(t — j)] calculated on a single time-series of 
length n , and I[Xi{t), Xi{t — j)] calculated on m dis- 
joint time-series of length ^ and then averaged. More 
specifically, consider: 

I = I[X,{t),X,{t-j)]=IxAn) + BE{n) (9) 



of many poorly sampled individuals will not help the MI 
converge to its bias-free, high cardinality estimate. 

Aside from the overall effect of n, there are other small 
sample size effects, and these effects can have profoundly 
different outcomes depending on the estimator. For in- 
stance, in the presence of few points, a KDE estimator 
will often, in the name of smoothing, over-estimate the 
probability for empty portions of the support, resulting in 
a PDF estimate that is closer to a uniform random vari- 
able. Thus, a KDE-PDF based TDMI calculation will 
likely underestimate the TDMI. In contrast, a histogram 
estimator will underestimate the probability for empty 
portions of the support, thus yielding a more sharply 
peaked distribution that will yield an over-estimate of 
the TDMI. Because of these opposing effects, it is possi- 
ble to verify the existence of finite-size effects by simply 
observing the difference between the KDE and histogram 
estimated TDMI estimates for the same data set. 

In the end, because we are working to understand how 
to estimate the TDMI in the context of large, poorly mea- 
sured populations, there will be a significant advantage 
to aggregating populations before estimating the PDFs 
necessary to carry out the TDMI calculations from the 
perspective of estimator bias minimization. 



B. Fixed point bias estimate for average and 
aggregate populations 



/' = — } IxAnIm) + BEin/m) 

Tn ^ — ^ 



(10) 



Now, if the bias. Be, scaled linearly in the number of 
points, n, then the bias contribution of Eq. [9] will be 
the same as the bias contribution of Eq. [lO] However, 
we know the bias obeys a power-law in the number of 
points, n, so we get the difference between bias estimates 
to at least be: 



6BE = {-y^BE{n/m))-BE{n) (11) 

(12) 



TO 

i—1 

TO — 1 



where 5Be > for all m > 1. Or, said differently. 



^ m 

-y"BE{n/m) > BE{r 



(13) 



where equality is satisfied only when to is one, or when 
the population consists of a single element. Note that 
when the population is particularly poorly sampled, say 
one or two measurements per element of the population, 
then m n and thus the difference in the bias of the 
population average versus the aggregated population will 
be will be order one. More importantly, averaging the MI 



The fixed point TDMI bias estimation method [T^ at- 
tempts to estimate the r = oo TDMI by randomly per- 
muting the time-ordering of one of the sets of pairs used 
to estimate the distributions for a given St or r. Funda- 
mentally, there are two different methods for estimating 
the TDMI fixed point (if it exists), random permutation 
within the individuals (i.e., not mixing individuals), and 
random permutation over the entire population, thus in- 
termixing individuals. 

The first method, individual-wise random permutation 
(IRP), involves randomly permuting the temporal order- 
ing of one column (without replacement) of the data set 
used to estimate the distributions without intermixing 
individuals, or: 



1 ^ 

Bjnp{r,n) = lim - ^ /(X^^ ^'^(z, t)) 



(14) 



where X^{i,t) is the i* random permutation (without 
replacement) of the left index of the column vector X" 
(i.e., do not permute the first index of Xij from equa- 
tion [6]) . The IRP-method random permutation occurs 
only within an individual and not across the population, 
thus destroying information about only time-based corre- 
lations while preserving inter-individual information. Fi- 
nally, there will exist a IRP bias estimate for both the 
average and aggregate TDMI cases, Bjj^p where Eq. 



14 



is specified for a single individual and then averaged over 
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the population, and Bmp which is specified exactly as 
per Eq. [M] 

The second method, population-wide random permu- 
tation (PRP), which exists only in the aggregated pop- 
ulation context, involves randomly permuting, without 
regard to the individual, one column of the entire popu- 
lations' data set used to estimate the PDFs or. 



1 ^ 

Z— )-oo /j ^ — ^ 



,K\hN,t)) (15) 



where X^{i, N,t) is the z*'*, random permutation (with- 
out replacement) of the both indices of column vector X^. 
Because the PRP estimate intermixes both the popula- 
tion and time, the PRP destroys information about both 
intra-individual time correlations and inter-individual in- 
formation (i.e., information about differences in normal- 
ization or the supports). In the context of a single source, 
BjRp = BiRp{n) = Bppp{n). Similarly, when the pop- 
ulation is both relatively uniform over both the PDFs 
and the support of the PDFs, then the PRP bias esti- 
mate will be equivalent to the bias estimate of the IRP, 
and thus can be thought of as an estimate of the estima- 
tor bias. However, if the support of the PDFs over the 
population is not uniform (i.e., if the support of any of 
the individuals of the population differs from the support 
of the population) , then the PRP bias estimate will differ 
from the IRP bias estimate (we will discuss this explic- 
itly in section VIA). Note that Bjpp, Bjpp, and Bppp 



are dependent on both r or St (because of the filtering 
effect discussed in the next section) and n, the number 
of points used in the estimate. In general, we will drop 
the n from the notation, and when there is not a r or 
St dependence, we will not include it in the notation (in 
general, for the data sets and 6t's we consider in this 
paper, there is not a strong St dependence). 



To understand how this filtering bias can affect the re- 
sults, consider a polarized example population made up 
of two differently measured subsets of individuals. Specif- 
ically, the first subset of the population has individuals 
sampled once an hour for a month and the second subset 
of the population has individuals sampled once a month 
for 20 years. These two population represent patients 
with acute and chronic conditions, respectively. If the 
TDMI of the population is calculated for any St less than 
a month, only data set one will be represented. Similarly, 
if the TDMI is calculated for St of a month or greater, 
only data set two will be represented. When plotting 
the TDMI graph versus St, the graph has, in a sense, 
a bias. Namely, two the graph represents two disjoint 
populations for St > one month. 

Of course, for real EHR data, even more complicated 
problems can appear when the same individual is sam- 
pled at different rates depending on the statistical state 
of the individual (e.g., a patient with a chronic and acute 
condition). This problem is particularly acute for health 
care data because health correlates with presence of mea- 
surement — healthy patients are not measured often 
while sick patients are — thus leading to the possibil- 
ity of having different subpopulations or statistical states 
being filtered out when calculating the TDMI for some 
St values. 

Thus, when estimating a TDMI for a population, it is 
important to quantify both who is populating the data 
set explicitly used to estimate the PDFs and how the 
proportionality of the subpopulations changes in the set 
used to estimate the PDFs as the delay is changed. If 
the population and proportionality of subpopulations in 
all the St or r TDMI estimates does not change, then the 
bias estimates are independent of the delay. 



Methods for assessing St bin compositions 



C. Non-estimator bias: how the TDMI calculation 
can act as a population filter 

While it is clear that the TDMI calculation only applies 
to the data used to estimate the PDFs, it is less obvious 
that the act of constructing the data sets used to estimate 
the PDFs can filter out substantial portions of the over- 
all population. Specifically, because construction of the 
data sets for the PDF estimation involves collecting all 
pairs of points separated by some time r or St, if some 
individuals do not have pairs of points separated by r or 
St, those individuals will be filtered out of, or excluded 
from, the data set used to estimate the PDFs and thus 
the TDMI. In this sense, the TDMI calculation implicitly 
filters the population by measurement frequency; this is 
not an externally imposed data constraint, it is simply 
a result of calculating the TDMI in the context of pop- 
ulation whose elements do not have identical measuring 
frequencies. 



To quantify the composition of the data set, begin with 
the following notation: (i) 6i(r) represents the number of 
pairs of points in the t time bin contributed by individual 
i; (ii) bmaxir) = N^nax and 6min(r) = Nmin correspond 
to the maximum and minimum number of pairs of points, 
over all individuals, present in the data set; N.^, represents 
the sum of bi^r), or the total number of pairs of points 
in the data set; (iii) N represents the total number of 
individuals in the population; and (iv), ^(t) represents 
the set of indices of individuals monotonically ordered 
by increasing bi. Based on these quantities, define the 
following functions: 

e(^(r)) = 6(0, (16) 
b(^) 

e(,») = ±Ml (17) 

Omax 

noting that 6(t) [57] is &{t) normalized to lie on the unit 
square. Next, define the following integral that quantifies 
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the population composition of the data set: 



(18) 



When the time series of the members of the popula- 
tion are both uniformly sampled and of the same length, 
Hq{t) will be equal to one; thus the closer Hq{t) is to 
one, the more composition of the data set includes the 
entire population uniformly, while the closer Hq(t) is 
to zero, the more composition of the data set represents 
a small subset of the population (possibly only an indi- 
vidual). A second, more gross quantification of how the 
population is represented in TDMI data set at a fixed 5t 
is the percentage of individuals that contribute at least 
one pair to the data set, or: 



^ 0) 



(19) 



Note that an alternative, highly related quantity we 
have found useful is the cumulative distribution function 
(CDF) of the Ws. 

Finally, while it is tempting to think of the population 
makeup of the r data set as a measure of homogene- 
ity within a population, this interpretation is sometimes, 
but not always, correct. What Hq{t), H^.^qIt), or any 
other like-minded metric really detail is how a population 
is measured and thus represented in a given r or 5t bin. 
Specifically, when measurement frequency is correlated 
with statistical state or dynamics, then it is likely that 
T bins will filter a population and make it more homo- 
geneous. However, it is easy to think of examples where 
measurement frequency is random, or uncoupled from a 
statistical state or dynamics, and in this case, all the di- 
versity of the population will be present in any given r 
time bin. 



V. POPULATION-BASED DEVIATIONS FROM 
THE INDIVIDUAL TDMI ESTIMATES 

A. Heterogeneity-based deviations from the 
individual: average TDMI case 

To understand how representative the average MI over 
the population is of an individual in the population, be- 
gin by setting pi as the PDF that most resembles the 
average (choosing pi to be the median among the piS 
would work as well) among the set of p^'s relative to the 
abstract support, S; note that the average PDF is defined 
by: 



1 ^ 



(20) 



Note that in this situation, every pi has the same ab- 
stract support (by definition), which we will denote as S. 
Further, note that it is possible to have a set of pi's such 



that no Pi resembles the mean graph of the piS. Next, 
relative to pi we can now relate each pi to pi as follows: 



Pi =Pi{S) - e^{S) 



(21) 



where (5) is distance between the graphs of pi and pi at 
a given value in S. Recalling the definition of the TDMI, 
we get: 



I[X{t)-X{t + T)] 



(22) 



1=1 

p(Xi(j),X,(j-r)) 
p{X^{]))p[X^{] -t)) 

i{T)dX{t)dX{t + T). 



Now, because integration is a linear operation, focus on 
the integrand instead, or more specifically, focus on: 



r(r) 



(23) 



N 



^ ^ IV i-\ vf ^M t PiXi{j),X^{j - t)) 

-- y^p{Xi{j),x,{j-T))iog{ ,y .... — TV 

N f-^ p{X,{j))p{Xi{] - t)) 



= p{X,{j),X,{j-T))\og{ 



p(X,{j),X,{j-t)) 



) 



^p(Xi(j))p(Xi(j-r))- 
+ G{N, e,,p{Xi{j),X,{j - T)),p{X,{j)),p{Xi{j - r))) 
= P{r) + G(r) 
where, G{t) is given by: 
G(r) = 



1 

N 



log 



p(Xl(j),Xl(j-T)) 



p(Xi(j),Xi(j-r)) 
piX,{j))piX,{j ~ r)) 
1 - 



(24) 




(for a more explicit calculation of /, c.f., appendix All. 
As each goes to zero, G goes to zero; thus the more 
support independent variance (recall is relative to the 
abstract support S) there is within the population, the 
larger G will be, and the less /(r) will represent the 
TDMI of an individual element within the population. 
Written explicitly, /(t) represents the "average" individ- 
ual plus the sum of the deviations from that individual. 



1. Entropy of the averaged population 

While the primary topic in this paper is the TDMI, we 
will contend briefly with the TDMI for r = 0, or the auto 
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information. Based on an identical means of calculation, 
the information entropy of a time series for a population 
can be defined as follows: 



N-l 



hi = /[Pilog(pi) +Pi ^ log(pi - e,) (25) 
- ^ eilog(pi - ei)]dx 



Thus, when e,; — )• 0, the hj for the population relative to 
the abstract support tends toward the information con- 
tained in an individual. 



B. Heterogeneity-based deviations from the 
individual: aggregate TDMI case 

To understand how the diversity in the population is 
rendered via the TDMI of the aggregated population be- 
gin by recalling that the TDMI for the aggregated set is 
defined by: 



/(r) ^I{Xr^;X!^) 



(26) 



'pixrnpix-)' 



: J LiT)dXl'-^dX'^ 



where, under ideal (single, stationary source) circum- 
stances the PDF of the aggregated density obeys 



1 



N 



-^p(Xr-(^);X;^(^)) 

i=l 



(27) 



where X^~^{i) and X"(j) represent the PDF restricted 
to individual i. Intuitively, Eq. 27 just says that we are 
creating the aggregate PDF by summing the graphs of 
all the individuals relative to the union of the suigports of 
all the individuals, that is, relative io S = U^iSi. 

To choose a PDF that most closely resembles a cen- 
troid, it is helpful to have a concept of abstract sup- 
port; however, because /(r) is defined relative to the ac- 
tual support of the population, the individual population 
PDFs do not separate as naturally as in the /(r) case. 
Nevertheless, conceptually, to define an abstract support 
in the aggregate circumstance, one needs to, in spirit, 
construct a situation where all the PDFs have roughly 
the same range or support. There are several ways one 
can imagine achieving such goal; here will define the ab- 
stract support, S, such that every patient has been renor- 
malized to have the identical support — the unit interval 
(i.e., [0,1]). It is important to realize that relative to 
the aggregate case there can be a very severe difference 
between the TDMI of an aggregated population defined 
on support of the S versus the abstract support S. To 
allow for quantifying these potential differences, define 



the TDMI for an aggregated population relative to the 
abstract support, I(r). Now, using the abstract support, 
select pi in the same way we selected pi in the previous 
section, by selecting the PDF that most closely repre- 
sents the mean over the population of PDFs relative to 
the abstract support. This definition implies an important 
difference in how Pi is specified in the aggregate case ver- 
sus the average case because, despite the fact that we use 
an abstract support to select a pi, I{t) is not calculated 
relative to the abstract support, and thus the differences 
between pi and Pi are instead defined by: 



Pi ^pi{s) - his) 



(28) 



where ii{S) is distance between the graphs of pi and pi 
at a given value in total support, S. Next, focusing on 
the integrand, l, and substituting Eq. [28] into Eq. [27] 
and recalculating t we arrive at (dropping the subscript 
on pi): 



r>( X"'^''' ■ X 

t{T) = p{xr^; X-) iog( 



(29) 



p{xr^)p{xj^y 

+ G{t){N, e,,p{X,{j),X,{j - r)),p{X,{j)),p{X,{j ~ r))) 
= Pir) + G{r) 

where G(r) is explicitly given by: 

1 - 



G(r)= log 




(30) 



(for a more explicit calculation of G and /, c.f., appendix 
A 2). Thus, as the average of the q's go to zero, G(t) 
will go to zero; moreover, when both the width of the 
band of PDFs decreases and when the supports of the 
distributions overlap (i.e., when CifLj^Si U^^Si), the 
TDMI of the aggregate population (/) will represent an 
individual within a homogeneous population (because the 
individuals within the population are similar). Similarly, 
when either the width of the band of PDFs increases or 
the supports of the distributions becomes disjoint, (i.e., 
when nfL^S, 0), /(r) will represent the TDMI within 
the diverse population. Or, said differently, the TDMI 
for the aggregated population will represent the TDMI 
of the population plus the sum of the individual based 
differences from the population. As we will see in the 
sections that follow, this second circumstance can lead 
to subtle difficulties in interpretation. Finally, note that 
the calculation that yielded l does not explicitly depend 
on the support; the explicit e's will differ between /(t) 
and I{t), but the explicit form of l will not. 
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1. Entropy of the aggregated population 

Again, while the TDMI is the primary topic of this pa- 
per, in both the interest of completeness and later anal- 
ysis, we define hj for the aggregated population, which 
was calculated in analog with /, as follows: 



hi = - 



'Mogb-^-^) 



(31) 



N 



N 



In contrast to the situation where the information en- 
tropy is averaged over the population, when the average 
1^ — ?> 0, the information entropy for the aggregated pop- 
ulation, hj, relative to the real support of the population 
tends toward the information contained in an individual 
who has the most data pairs in the PDF estimate. 



VI. HOW TO INTERPRET THE TDMI FOR A 
POPULATION, OR, TDMI-BASED METHODS 
FOR INTERPRETING POPULATION 
DIVERSITY 

To achieve a practical understanding of the meaning of 
the TDMI in the context of a population, we have to com- 
bine information from the previous section to construct 
an explicitly numerically computable means of interpret- 
ing /(r) and /(r). Practically speaking, there are two 
broad situations: (i) /(r) is practically calculable (when 
I{t) is calculable, /(r) always will be); and (ii), /(r) is 
not calculable (usually to estimate /(r) there need to be 
at least 100 pairs of points per representative element) 
leaving us only with /-related quantities. Relative to 
the first situation, define the difference between /(r) and 
/(r), or 



«(r) = |/(r)-/(r)| 



(32) 



= 1 f piiS)~ /pi(^)| + | f G~ I G\ + {B-B) 

^5p + 5Gj + SB 

This allow for the following conjecture which we will not 
proven in this paper: 

Conjecture 1 In the circumstance where I{t) can be ac- 
curately estimated, SI{t) ^ if and only if the population 
used to estimate I{t) and I{t) is statistically homoge- 
neous temporally (i.e., the PDFs representing the indi- 
viduals in the population are identical, as are the PDFs 
under temporal evolution) . 

The forward direction of the if and only if statement, 
that 6I{t) implies a heterogeneous population will 
be briefly discussed in section fVI B[ this direction is more 
complicated to prove. The reverse direction of the if 
statement in this conjecture claims that if the population 



represents a single, stationary, homogeneous distribution 
then SI{t) ~ 0; this claim relies on the fact that in this 
circumstance all e's are zero and thus /(r) (Eqn. 22) 



and /(r) (Eqn. 26 ) represent a homogeneous source and 
are equivalent up to bias. Essentially, when one can es- 
timate SI(t), one can interpret the population make-up 
without delving deeply into the detailed sources of the 
TDMI. In contrast, when only /(r) is practically calcu- 
lable, the interpretation of /(r) can only be understood 
though understanding the source of the TDMI. Neverthe- 
less, in general, it is insightful to understand the sources 
of the TDMI, and the sources of the TDMI arc tied to 
the make-up of the population. 

From a detailed perspective, the make-up of the pop- 
ulation is important because the deviation of the TDMI 
from the homogeneous case is due to non-zero e's, and 
the source of non-zero e's can differ from the source of 
non-zero e's. Specifically, e can only be non-zero because 
of differences between the graphs of the pi's. This is be- 
cause all the piS for the average TDMI have the same 
support. In contrast, the source of non-zero e's is due to 
a heterogeneous population can be split into three broad 
categories: (i) differences in the TDMI estimates due to 
differences in the supports independent of the graphs of 
the PDFs; (ii) differences in the TDMI estimates due 
to differences in the graphs independent of the supports; 
and (iii), differences in the TDMI estimates due to the 
supports' effect on the graphs. 



A. Support dependent, graph independent, effects 
on the population TDMI 

To understand and quantify the differences in the 
TDMI estimates due to differences in the supports in- 
dependent of the graphs of the PDFs, consider the dif- 
ference between the random permutation bias estimates 
defined in section ITV Bl 

First, recall that the population- wide random permu- 
tation bias estimate will be roughly equivalent to the 
estimator bias, or Bpj^p{t) « Be{t) regardless of the 
supports or densities of the elements (c.f., [12] for small 
sample size qualifications of this statement). Next, note 
that the individual-wise random permutation bias esti- 
mate, -B/_rp(t) represents the bias due to heterogeneity 
in the supports plus the estimator bias. Thus, the con- 
tribution to the bias due to the diversity in population 
normalization is approximated by the difference between 
the individual-wise and population-wise random permu- 
tation bias estimates: 



(33) 



There are two reasons why Bpp{t) can be non-zero. 
First the number of points used to calculate the two 
can differ by orders of magnitude (say, a population of 
1,000 with 10 points each); in this case, Bpip{t) rep- 
resents the 1/n effect on the bias estimates. In the case 
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(a)pi(S'i) for three distributions of Gaussian random 
numbers with means equal to 0, 2 and 4 




(b)pi(5) for three distributions of Gaussian random numbers 
with means equal to 0, 2 and 4 as well as 
pW = 1/3 ELi 

FIG. 1: Graphically comparing p (average PDF) and p (PDF 
of the aggregate) for a collection of three collections of Gaus- 
sian random numbers whose distributions have means 0, 2 
and 4 respectively. 



where the number of pairs used to estimate -Bp_rp(t) and 
Birp{t) are relatively similar (e.g., more than 100 and 
within an order of magnitude; to control for the num- 
ber of points, it is easy reduce the cardin ality of the 

shows visu- 



1(b) 



set used to calculate -Bp_rp(t)) Fig. 
ally how these bias estimates would render differently. 
In this context, i?/pp(T) would be identical to /, where 
as randomly permuting the entire population, such as 
is done to estimate Bprp{t), will result in one of the 
marginal distributions becoming p{S) — a uniform dis- 
tribution instead of three Gaussians with distinct means 
— thus greatly changing the amount of mutual informa- 
tion. These effects are primarily support-driven effects; 
note that while it is possible that differences in the un- 
derlying distribution function can be rendered through 
i?pp(r), differences in the support of those distributions 
will always be rendered through Bpip[t). As we will see 
in a moment, Bhp^t) ~ Bipp(t) is not enough to imply 
that SI{t) « 0, but is enough to imply that the variance 
in the boundaries of the supports will all be relatively 
small. Nevertheless, while in some circumstances it may 
be difficult to use the bias estimates to detect a differ- 
ence in the average versus aggregate TDMI, we can use 
the bias estimates to interpret the average and aggregate 
TDMI signal. In particular, when i3pp(r) < Be{t), 
intermixing individuals' measurements has no effect on 
the random permutation bias estimate, implying that 
there is very little population selection information in 
the TDMI estimate. Thus, Brp{t) < Be{t) at least im- 
plies overlapping distribution supports. Similarly, when 
Brp{t) 3> Be{t), intermixing elements has a profound 
effect on the random permutation bias estimates; in this 
instance, Brp{t) reveals a bias whose source is the di- 
versity of the supports among the elements. This leads 
us to the measure of homogeneity of supports that is very 
computable even for poorly measured populations (e.g., 
when only /(r) is calculable); the TDMI homogeneity of 
support is defined by the following equation: 



nsir) 



\B 



IRP 



(r)-/(r)| 



/(r) 



(34) 



The closer 'Hs{t) is to one, the less the diversity of the 
supports over the population; similarly, the closer 'Hs(t) 
is to zero, the greater the diversity of the supports over 
the population. (Again, note one must control for the 
dependence on the number of pairs used to estimate the 
above quantities.) 

It is worth noting that a similar analysis can by done 
by comparing I(r) to /(t), as their difference will re- 
veal support based effects. The principles behind a 
5I{t) = |I(t) — I{t)\ and T-Ls{t) are similar in that they 
both address normalization of support based effects, only 
'Hs{t) depends on quantities that represent distributions 
— BiRpij) and Bprp{t) can both be estimated many 
times — and thus are likely more robust. 
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B. Graph dependent, support independent, effects 
on the population TDMI 

To understand in detail how differences in tlie graphs 
independent of the supports can affect the /(r) and /(t), 
begin by assuming that all the p^'s have the same support, 
or that ofL^Si = ufl^S'i. In this circumstance, the = 
Q for all i. Thus, the contribution of the diversity of 
PDFs within the population to /, or the deviation from 
the mean at a particular x £ S va lue, i s ca ptured by G{t) 
and G(r) as defined in Eqs. 



24 



and 



30 



Consequently, 



the only way that /(r) can be different from /(r) up to 
the estimator bias is for the variation in the collections 
of PDFs to be due to the order of averaging as rendered 
though the G"s. 

Based on the aforementioned intuition, we claim (e.g., 
conjecture [T]) that SI{t) is equal to zero if and only if 
all the e's are zero. While we will not present a qualified 
proof of this claim here, we can offer an intuitive argu- 
ment as to why our claim is justified. First note that by 
inspection, if — for all i, SG{t) = G{t) = G{t) = 0. 
Now, what remains is to understand what happens to the 
G"s when there are non-zero e's; to do this, note that we 
reduce the G"s to the terms they do not have in common: 



G(r)~5(T) = (ft.-[e])log( 



Gir) ^ gir) ^ ^ g^. - log( _ ^1 ) 

'(36) 

and then consider the difference in these quantities: 

SG ^ Sgir) ^ \-giT) ~ g{T)\. (37) 

Now, further noting that ^(t) is convex (or concave, de- 
pending on the p's) and applying standard convexity ar- 
guments, 6g will not equal zero unless = for all i. 
Thus, while it is possible that, through the act of inte- 
grating the G"s, symmetries will allow for the G"s to be 
equal, it is extremely unlikely. Moreover, because the 
convexity or concavity of ^(r) depends on the nature of 
the p's, it is difficult to say whether ^(r) will be, in gen- 
eral, greater or less than ^(t). Nevertheless, it appears 
in computational experiments that gi^r) is often less than 
^(t). In any event, it is now more clear how diversity 
amongst the distribution of p's over the same support 
can (and likely will) force SI{t) ^ 0. 

In the situation where /(r) is not accessible, it may 
not be possible to fully understand the meaning of /(r). 
While 'Hs(t) can help identify support based effects, pure 
graph-based temporally dependent effects may be difficult 
to estimate. In particular, if the sample size for some of 
the individuals is small, then it will be difficult to deter- 
mine the contribution to /(t) due to purely graphic diver- 
sity simply because there will be such high variance in the 
graphical PDF estimates due to small sample sizes |28j. 



N 



1~ ^ 

1 



(35) 



In this case, the best that can be done is to estimate 
more static measures of graphic diversity such as those 
presented in section |VII[ 



Support dependent, graph-based effects on the 
population TDMI 



There are two potential contributors to support depen- 
dent, graph-based effects on SI{t), 5G{t) and 5p[t). 

The contribution to 5I{t) due to 5p{t) is entirely due 
to the limits of integration; the integrand for the average 
and aggregate p component of the TDMI are identical. 
Thus, intuitively, 5p > Q because of the relative location 
of the support of pi in reference to the total support of the 
population; pi will represent a more peaked distribution 
when defined on S compared to S. Note that while 5p 
is, in general, computable, it has similar characteristics 
to HsIt) with more severe bias issues. 

The contribution due to SGj is not as intuitive; to 
understand how diversity in the supports contributes to 
6G f via the induced differences in the e's, consider Figs. 

l(b)| Relative to Fig. 1(a) begin by defining 



la 



and 

p{S) as the average of the PDFs relative to the abstract 
support, or p{S) = |(pi(5) +p2{S) +P3{S)); here all the 
e^'s will be small and independent of the support. This is 
how variation in the population is rendered when calcu- 
lating /, and thus how G will render. In contrast, define 
the average of the PDFs relative to the total support, or 
p{S) = |(pi(S') +P2{S) +P3{S)); this is the aggregate 
scenario. Here it is clear that both the averaged PDF 
will not resemble any of the PDFs and relative to a se- 
lected pi. Moreover, all q's will be relatively large and 
on the order of the various pi(SYs over a non-trivial por- 
tion of the population support U^iSi. Because of this, 
when the supports of the individuals differ, the largest 
term in /(r), G(t), will be accounting primarily for vari- 
ation within the distribution of the supports of the popu- 
lation, rather than support-independent variation within 
the population. Moreover, when the supports of the indi- 
viduals are relatively invariant, / will be independent of 
time even when the / of an individual varies with r. In 
any event, the point is, variation in the supports of other- 
wise identical distributions affects how the distributions 
are rendered though the TDMI calculation. 

Finally, when only /(r) is available, which implies the 
presence of individuals with too few pairs of points to ac- 
curately estimate a PDF and thus the TDMI, and when 
there are support-dependent graph-based effects in the 
TDMI, it will likely be difficult to separate the support 
dependent, graph-based effects from the support inde- 
pendent graph-based effects on the TDMI (e.g., on the 
structure of the temporal correlation). 
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VII. NON-TDMI-BASED METHODS FOR 
INTERPRETING POPULATION DIVERSITY 

In this paper, we claim that the TDMI-based analy- 
sis can be used to both detail nonlinear correlation in 
time and interpret the composition of the population 
to which that correlation pertains to (i.e., whether the 
TDMI reflects and individual/homogeneous population 
or a diverse population). To verify this claim, we require 
a set of methods for establishing a baseline that are inde- 
pendent of information-theoretic machinery and can be 
used to interpret the make-up of the population. We pro- 
pose three different quantifications of homogeneity of a 
population: (i) homogeneity in measurement representa- 
tion, which addresses the variance in the distribution of 
the number of measurements per element of the popula- 
tion; (ii) homogeneity in support, which addresses vari- 
ation in the supports of each elements' distribution; and 
(iii) homogeneity in density, which addresses variation in 
the PDFs (or the graphs of the PDFs) over the popula- 
tion. Note that all but one of the methods for quantifying 
homogeneity are independent of time, and all are inde- 
pendent of any time-based correlation structure existent 
within the data set. Moreover, the homogeneity qual- 
ification methods we propose here are neither exhaus- 
tive nor particularly innovative; rather they are simple 
intuitive methods devised to interpret and confirm the 
TDMI-based results. Nevertheless, many of these meth- 
ods are useful in their own right; moreover, at least one 
of the quantities we define here is required to supple- 
ment the TDMI analysis when very few measurements 
exist per individual. Finally, table |T] contains a summary 
of the ten TDMI-independent quantities are we use to 
verify the TDMI methodology. 



A. Homogeneity in measurement composition 

To quantify homogeneity in measurement composition, 
begin with the following two quantities. First, consider 
the difference between the mean of the raw measurements 
over the population versus the mean of the individual- 
wise measurement means, or: 




(38) 



non-TDMI-based quantities for characterizing a population 




HlflFpTPTlPP Vlpf WPPTI 

the population and 
individual element 
means 


n iiTiT^lips pifhpr Ml 

most elements have a similar 
number of measurements, or 
(ii) the individuals come 
from distributions with sim- 
ilar means; ^ implies the 
converse 


V{f{n)) 


variance of the PDF 
of the number of 
measurements per 
individual 


(i) V ~ 0, ~ imply el- 
ements were measured simi- 
larly; ^ 0, ~ implies 
elements measured at differ- 
ent rates; ^ 0, Hx 3> im- 
plies elements measured at 

HiffpfPTit rft1"ps wi1"Vi Hiffprinp' 

source distributions. 




Kls™,„ (i)] 


lower support boundary 
mean. 


Smax 


^ y^vriax j\ 


upper support boundary 
mean. 




Var(smi„) 


lower support boundary 
variance. 




Far(s,„ax) 


upper support boundary 
variance. 


\s\ 


Smax Smin 


length of support mean. 






length of support variance. 


Hra 


area between the 
(point-wise) least 
and greatest PDF 
graph 


quantifies variance between 
the PDFs of the popu- 
lation; ~ implies el- 
ement PDFs are homoge- 
neous; very sensitive. 


Vs{p) 


J,EMx)f] 
E\p{x)Ydx, variance 
of the PDFs relative 
to a specified support, 
S 


~ implies homogeneity in 
PDFs; larger Vars{f) im- 
plies greater heterogeneity 
in the PDFs. 


Vsip) 


Vs{p) calculated rel- 
ative to the support 
of the aggregate pop- 
ulation; S = UiLiSi; 
note that there does 
exist an aggregate 
normalized support, 
S, but we will not use 
this quantity here. 


Vg(p) has the same interpre- 
tation as Vs{p) in general, 
but has the potiental to in- 
clude support-based effects. 




Vs{p) calculated rel- 
ative to the abstract 
support of the popu- 
lation, 5 


Vg (p) has the same interpre- 
tation as Vs{p) in general, 
but excludes support-based 
effects. 



where is the number of points contributed by individ- 
ual k, N is the number of individuals in the population, 
and uq = 0. Now, Hx « under two circumstances: (i) 
the distribution of rifc's has zero or small variance, re- 
gardless of the collection of individual distributions; or 
(ii) each individual comes from an identical distribution. 
Second, consider the variance of the probability density 



TABLE I: Summary of all the non-TDMI based metrics used 
to assess homogeneity in a population (both among the graphs 
and the supports) used to verify the TDMI-type analysis. 
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of the number of measurements per individual: 

Vf(n) = Var(/(n)) (39) 

where f{n) denotes the density of measurements per in- 
dividuaL Combining these two quantities we arrive at 
three cases: (i) V/(„) ~ imphes that ~ 0, together 
implying that the elements were measured similarly — 
no insight into the original distributions can be made; 
(ii) V^/(n) ^ and ~ together imply that the ele- 
ments were measured at different rates regardless of their 
source distributions (which can be identical); and (iii) 
Vy(„) >• and ^ together implies that the elements 
were measured at different rates and likely have differing 
source distributions. Note, that in general, both of these 
metrics are rather sensitive to diversity in a population. 

B. Homogeneity in measurement distribution 
supports 

To characterize homogeneity in distribution support we 
rely on a brute force homogeneity characterization tech- 
nique. Begin by recalling that the support for element i's 
distribution as Si = [sminii), Smaxii)]- Given these sets, 
which are defined by the individuals' measurements, de- 
fine the mean and variance of the support minima, max- 
ima, and length by: 





= E[Srmn{i)] 


(40) 


^rnax 


= E[Smax{i)\ 


(41) 




^ Var{Smin) 


(42) 




= Var{Smax) 


(43) 


\~s\ 


— '^max ^min 


(44) 






(45) 



These quantities afford relatively simple representations. 
For instance, when the minima, maxima, and lengths for 
the population have small variance, the intersection of 
the supports will not differ significantly from the union 
of the support — meaning the supports overlap. While 
a large variance in any either the minima, maxima, or 
lengths implies that the supports differ significantly over 
the population. 

C. Homogeneity in the distribution of the graphs 
of the measurement PDFs 

To specify homogeneity in the PDF of the population 
we will use two methods. Intuitively, all of the meth- 
ods characterize, in one way or another, the width of the 
maximum and minimum band of PDFs of the population 
over the support of the entire population. Begin by defin- 
ing the PDF for an individual by Pi{x), the supremum of 
the PDFs of the population by maxi(p(a:)) — pm{x)^ and 
the infimum of PDFs of the population by mini(p(a;)) ~ 
Pm{x), over the union of the supports, S = Si. First, 



using the Li (pseudo) distance [25] we can define the 
relative area of the width of the band of PDFs by: 

„ Js\PM{x)-p„,ix)\dx 

Hra = ^2 (46) 

}gPM\x)dx 

The relative area, Hjia is literally the proportion of the 
supremum of the collection of PDFs that coincides with 
the infimum of the collection of PDFs. When H^a is 
close to one, the maximum distance between PDFs over 
the population occupies all the volume of the population- 
wide PDF. In other words, the population has at least 
two substantially different PDFs. Similarly, when Hjia 
is near zero, this implies that the proportion of the area 
between the supremum and infimum over the collection of 
PiS relative to the total area occupied by the supremum 
of the piS over the population is very small. Thus the 
implication of -ff^A being near zero is that the piS are all 
nearly identical. However, this method is very sensitive 
to heterogeneity; a single individual's PDF differing from 
the rest of the population can maximize Hra at one. In 
contrast, the second method for evaluating the diversity 
in PDFs over the population quantifies diversity from a 
mean within the population by estimating the variance 
of the PDFs at a given at a given x integrated over a 
given support [S), or 

Vs{p) = [ EMxjf] - E[p{x)]^dx (47) 
Js 

Note, Vsip) can be estimated relative to two different 
supports, the union of the supports, or the abstract sup- 
port. This is an L2 fiavored representation of the varia- 
tion in PDFs; the variance of the pi's at a given x is max- 
imized when Pi's are maximally orthogonal (in the sense 
of an inner product between the Pi's) to one another, 
and minimized when the p^'s are minimally orthogonal 
(meaning they coincide). Thus, Vs{p) has the potential 
to capture both support- and graph-based variation, de- 
pending on whether V is calculated relative to S, which 
will include support-based effects, or S, which will not 
include support-based effects. 

VIII. ASSEMBLING THE PIECES: AN 
EXPLICIT PRESCRIPTION FOR TDMI 
ANALYSIS AND INTERPRETATION FOR A 
POPULATION OF TIME SERIES FOR A FIXED 
TIME SEPARATION 5t 

The interpretation of the TDMI and entropy for a 
complex, diversely measured population can be split into 
three broad steps: (i) performing a preliminary interpre- 
tation of I{St) and I (St); (ii) performing an interpreta- 
tion of 51{6t) or I{6t) for the population; and (iii) under- 
standing the make-up of the data explicitly used to esti- 
mate the PDFs, yielding an understanding of what pro- 
portion of the population as used in the calculation. All 
the TDMI quantities used for the TDMI-based analysis 
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A. 



51 > B (5t) 

IRP 



How to interpret the TDM I for a population of time series 
beginning with all the data pairs with a given 5t 

^Estimate N min ^^^l 



N (5t) < 100 




B. 



HJ5t) ~ 1 




HJ3X) ~ 



I Estimate HJ8t) 



H (5t) ~ 
s 



Heterogeneous 
population; 
supports (ranges) 
diverse; 
TDIVII present 



Population uniformly 
represented 



I > B (5t) 

IRP 



Portions of 
population 
overrepresented 



Homogeneous 
Population; 
TDMI present 




H (5t) ~ 1 
s 



Hetergeneous 
population; 
supports (ranges) 
uniform; 
TDIVII present 




Heterogenous 
population; 
diverse supports; 
diverse graphs; 
TDMI present 



Hetergeneous 
population; 
diverse supports; 
homogeneous 
graphs; 

TDIVII present 



Heterogeneous 
population; 
diverse graphs; 
TDMI present 



Homogeneous 
population; 
TDMI present; 



FIG. 2; The graphical schematic for the TDMI analysis of a population. 



are shown in table |TTj a graphical schematic for applying 
this infrastructure is shown in Fig. [2j and a detailed algo- 
rithmic schematic for applying the TDMI infrastructure 
to a population is depicted via pseudocode in appendix 



A. Step one: determining the computability of I{5t) 

To begin, one must determine whether I{St) and I{6t) 
are calculable for a given (or set of) St{s). In general, to 
estimate I{St) every representative individual must (un- 
der most circumstances) have at least 100 pairs of points 
available for the TDMI calculation [T9]. Similarly, to es- 
timate I{5t) there must be at least 100 pairs of points 
gathered over the entire population — this is why I{St) is 
so useful in the context of a population. 

Assuming that I{St) is calculable, because the calcula- 
tion of / for an individual is independent of the support of 
the distribution, the variance in the distribution of I{6t) 
is due to differences in the graphs of the PDFs represent- 
ing each patient at a given St. Further, because I{St) 
is made of individuals who have been averaged, the in- 
terpretation of the statistical moments of I (St) (i.e., the 
mean, variance, etc), is a scientific problem that depends 
on the particular circumstances. 

The interpretation of I{St) is more difficult because 
I{St) can be composed of purely graphical, purely sup- 
port, and intermixed support and graphical components. 



Thus, because I{St) is a population-dependent quantity 
where the individual contributions cannot be separated, 
it will be treated in the next section with 61 {6t). 

B. Step two (A in Fig. [2]): interpreting 51 [St) or 

Step two has two courses of action depending on 
whether it is possible to calculate I{5t) or not: (i) I{5t) 
and I{5t) are calculable and thus 61{dt) can be computed; 
and (ii) only I (St), BB.p{St), and T-Ls{St) are calculable 
(when I (St) is calculable, this will always be the case). 
When SI{St) is available, it, as estimated by both a KDE 
and histogram estimator, is all we need know: the closer 
dI{St) is to zero, the more homogeneous the population 
is and the more I{St) represents a single, statistically sin- 
gular source and the larger in magnitude 6I{St) is, the 
more statistically heterogeneous the population is and 
the more I{St) represents the population. Of course, if 
the histogram and KDE TDMI estimates differ substan- 
tially, it is likely that there are significant small sample 
size effects present in I {St), and this needs to be taken 
into consideration when interpreting SI{6t), I (St) and 
I (St). Moreover, in this circumstance, calculation of ei- 
ther BRp{6t) = \BiRp{St)-BpRp{St)\ OTTisiSt) can be 
used to further qualify the small sample size effects on 
the variation in the supports versus the graphs. Finally, 
when dI{St) is positive, and T-Lsi^t) shows no diversity 
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TDMI-based analysis quantities 



Quantity What it signifies 



I{St) 

i{st) 

I{5t) 
SI{St) 
Be (St) 



population averaged 
TDMI 



aggregated 
tion TDMI 



popula- 



BiRp{St) 
BiRp{St) 
BpRp{St) 
Hs{5t) 
BRp(5t) 



aggregated popu- 
lation calculated 
relative to the 
abstract support 5 

\i{St) - I{St)\; differ- 
ence between the av- 
erage and aggregate 
TDMI 

PDF estima- 
tor bias; usually 
Be (St) ~ BpRp{St); 
BE{St) can be esti- 
mated in a variety of 
ways 

individual permuta- 
tion bias averaged 
over a population 



individual 
tion bias 



permuta- 



population permuta- 
tion bias 

\BiRp(St)-i(St)\ . 

i(st) 

quantifies diversity 
of supports 

\BpRp{St) 
BiRp(6t)\; quan- 
tifies diversity of 
supports; quanti- 
fies cardinality of 
individual data sets 

SG{St) difference in the dif- 
ference between how 
population diversity 
renders in / and I{5t) 

quantifies diversity in 
supports 

Hq (St) how representa- 
tive the population 
used to estimate 
/ at St is of the 
time-independent 
(e.g., the entire) 
population 

Nm.in{St) minimum number of 
pairs of points con- 
tributed by any one 
individual 



What it quantifies 

quantifies average TDMI of 
a population 

quantifies TDMI of an ag- 
gregated population 

support independent TDMI 
of an aggregated population 



^ implies homogeneity, < 
implies heterogeneity 



the number above which the 
I is considered to be positive 



bias estimate that preserves 
information about the rela- 
tive ranges of individuals 

bias estimate that preserves 
information about the rela- 
tive ranges of individuals 

bias estimate that destroys 
information about the rela- 
tive ranges of individuals 

~ 1 implies homogeneous 
supports; ~ implies di- 
verse supports 

~ BiRp{5t) can imply di- 
verse supports or cardinal- 
ity per-element data sets; ~ 
can imply homogeneity in 
supports 

> implies population 
diversity 



> implies population 
diversity. 

~ implies the entire 
population is well repre- 
sented; ~ 1 implies por- 
tions of the population are 
overrepresented 



a lower bound on the rep- 
resentation of an individual; 
l/Nmin{St) is a rough esti- 
mate of BEiSt) for the indi- 
vidual with the fewest pairs 



due to the supports, then all the diversity in the popula- 
tion is due to the graph-based diversity. 

When I {St) is not calculable, one is left with only I {St), 
i{St), and Biip{St) or H{St). In this case, one can still 
use B^p{St) or T-L{St) to detect the homo- or heterogene- 
ity in the supports. If there is no support-based variation 
then pure graph-based heterogeneity maybe be difficult 
to determine; in this circumstance we recommend using a 
non-TDMI metric such as Vs{p), which will have greater 
statistical power while sacrificing temporal dependence, 
to help determine the graphical composition of the pop- 
ulation. In general, if there is support-based variation, it 
will likely be difficult to separate support-based, versus 
graph-based, contributions; it will be even more difficult 
to specify the proportion of diversity contributed by the 
support- versus graph-based effects. 



C. Step three (B in Fig. [2]): Assessing population 
representation 

Finally, it is extremely important to understand what 
portions of the population actually have points in a given 
St bin. Recall that the make-up of the population used 
to estimate I at a specific St is a concern because of the 
filtering effect (c.f., section IV C); specifically, it is pos- 
sible to have entire portions of the population excluded 
from the data set as well has a highly nonuniform distri- 
bution of the population represented in the data set used 
to estimate the PDFs. Written differently, it is important 
to always remember that SI is always calculated relative 
to a fixed St which will have a particular bin popula- 
tion — when studying the evolution of / as St is varied, 
the representative population can change as St changes. 
Thus, it is important to at least calculate HQ{St) or an 
HQ-like quantity to verify what proportion of the popu- 
lation is being included in the PDF estimate. Moreover, 
we also find it convenient to keep track of the minimum 
(and sometimes maximum) number of pairs of points con- 
tributed by an element represented in the data set used to 
estimate the PDFs; we denote this number by N„iin{St) 
as a measure of the least representative individual. 



IX. QUANTITATIVE EXAMPLES FOR TDMI 
INTERPRETATION AND POPULATION 
HOMOGENEITY EVALUATION 

A. Simulated data examples: the quadratic map 
and the Gauss map 

To explicitly demonstrate how to interpret / and / 
in the presence of a diverse population in a variety of 
circumstances, consider two sources of simulated data, 
the quadratic map 



Xt+i = f{xt) = axt{l - xt) 



(48) 



TABLE II: Summary of all the TDMI-based metrics used to 
interpret the TDMI and determine the population composi- 
tion. 
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where a is set to 4 and the Gauss map 



xt+i = g{.xt) 




1 



mod 1 



(49) 



These sources were chosen because their statistical struc- 
tures are weU understood [53] [23] [5], they are chaotic, 
they are both 1-dimensional maps defined over the unit 
interval (meaning, they have the same support), and 
they have relatively different invariant densities. Fig- 
ure [3] shows the the graphs of the quadratic and Gauss 
maps, their individual invariant densities (PDFs of the 
orbit), and the sum of their invariant densities. Thus, 
in this context, the difference between pf and Pg, t{x), 
is both large enough such that the G's will be non-zero 
and is non-uniform over the domain or nonlinearly de- 
pendent on X. The data sets we will use, based on the 
maps above, include: 

Data set 1 Quadratic map time-series with 1000 points; 
this is one of the data sets meant as a baseline from which 
all the other cases can be compared. 



(a)The graphs of the quadratic map (Eqn. |48| and the 
Gauss map (Eqn. |49|l 



Data set 2 Gauss map time-series with 1000 points; 
this is one of the data sets meant as a baseline from which 
all the other cases can be compared. 

Data set 3 Data sets 1 and 2 concatenated into a sin- 
gle data set with 2000 data points; this data set is used 
primarily to test the effects of differing PDFs within a 
population on l, G, and thus, I versus I . 




- Quadratic KDE estimate 

Gauss KDE estimate 
-Quadratic+Gauss KDE estimate 

.2 0.4 0.6 0.E 



Data set 4 50 independent, concatenated quadratic map 
time-series with 20 points each totally 1000 points; this 
data set is meant to highlight the effect of the estimator 
bias when calculating I versus I . 

Data set 5 10 independent, concatenated quadratic map 
time-series with 100 points each totaling 1000 points; this 
data set is meant to form a baseline for data set 6. 

Data set 6 10 independent, concatenated quadratic map 
time-series with 100 points with disjoint .supports with in- 
creasing means totaling 1000 points; this data set is used 
to demonstrate the effect of diverse supports amongst the 
population where the PDFs are identical on l, G, B, and 
thus I versus I. 



(b)KDE of the invariant density (PDF of the orbit) for the 
quadratic map, Gauss map, and the sum of the quadratic 
and Gauss maps 



FIG. 3: The graphs of the quadratic map (Eqn. 
the Gauss map (Eqn. 



481 and 



49 1 — note the significant difference 



between the graphs of the mappings, and invariant density 
(PDF of the orbit) for the quadratic map. Gauss map, and the 
sum of the quadratic and Gauss maps — note the significant 
differences between the relative p's. 



Each data set will be denoted by Di where i is the indexed 
label of the respective data set. 

Finally, to save space, we will demonstrate the TDMI 
and non-TDMI-based computations on all the simulated 
data sets at one time. We will adhere to the algorithm 
shown in Fig. [2] when analyzing the real data sets. 



TDMI-based analysis of the simulated data 
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TDMI-based quantities 


Source 


I{r = 1) 


J(r = 1) 


Birp{t = 1) 


BpRP 


Birp{t = 1) 


Brp{t = 1) 


Hs(t = 1) 


Sp{T = 1) 


5G{t = 1) 


SI{t = 1) 




0.72 





0.008 


0.008 


0.008 





0.99 











D2 


0.31 





0.012 


0.012 


0.012 





0.96 











D-i 


0.52 


0.37 


0.01 


0.008 


0.007 


0.001 


0.98 





0.15 


0.15 


Di 


0.34 ±0.07 


0.71 


0.18 ±0.03 


0.013 


0.011 


0.002 


0.98 





SI 


0.37 ±0.07 


Ds 


0.48 ±0.01 


0.71 


0.04 ±0.01 


0.006 


0.007 


0.001 


0.99 





SI 


0.24 ±0.01 


De 


0.48 ±0.01 


1.12 


0.04 ±0.01 


1.12 


0.011 


1.11 





unknown 


unknown 


0.55 ±0.01 



TABLE III: TDMI results and homogeneity metrics for the simulated data sets one through six. 



Base cases: testing the TDMI-based metrics 
on individuals — In table IIIII one can see that both 
the quadratic and Gauss maps have distinctly different 
I{t = 1) values. Note that the Gauss map has a faster 
decay in correlations; for both maps, all correlations in 
time decay by r = 6. Further notice that all bias estima- 
tion schemes are essentially identical as expected. This 
also implies that support-variation detecting quantities 
such as Us register no variation in supports. 

Support dependent, graph independent 
analysis — To see how diverse supports are ren- 
dered, consider the contrast between I?5 and Dg, whose 
only difference is in the location of the supports. Both of 
the support-based TDMI based metrics, B^p and Hs, 
produced dramatic representations of the disjoint nature 
of the supports of data set six (c.f., table III I. Notably, 



the difference between both B^p and Hs on D5 and Dq 
are near their respective maxima. 

Graph dependent, support independent 
analysis — Data set three, the quadratic-Gauss ag- 
gregated data set, has homogeneity in support in all 
support-based metrics as can be seen in table III In 



particular, both Tis and all the random permutation 
bias estimates are totally unaffected by the existence 
of e or e ^ 0. Furthermore, 51 ^ 0, meaning that 
the population averaged TDMI and the TDMI of the 
aggregated population were different. In particular, 
I > I, thus leading to the conclusion that G > G, which 
is not surprising given that when the = for all i, 
it is reasonable that the e's register greater though the 
sum than the aggregate. In any event, all the TDMI 
based metrics registered the diversity in the population 
of PDFs. 

Support dependent graph-based analysis — To 

begin to see how support and graph effects intermix, con- 
sider / for a data set identical to Dq except where the 
quadratic data has been replaced with uniform random 
numbers, thus yielding data with purely population lo- 
cation information; denote this data set as D'g. Now, 
iiD'^) w 1.16 ±0.01, thus comparing /(Dg) to I{D'q), we 
notice that the presence of intra-agent time-based corre- 
lation decreases the population scale TDMI by a small 



but measurable amount — here \I{Dq) — I{Dq)\ w 0.04. 
Therefore, while nearly all the intra-agent TDMI is sub- 
sumed by the inter-agent TDMI, when there is a presence 
of both strong intra-agent information as well as strong 
inter-agent information (i.e., highly disjoint supports), / 
will contain both intra-agent and inter-agent components. 



What the example in the previous paragraph shows 
is that deducing the contribution of the intra-agent and 
inter-agent components to / will in many cases, be non- 
trivial. Nevertheless, the use of metrics that detail the 
PDF variation can sometimes aid in the interpretation of 
/. First, considering how the heuristic metrics of PDF 
variation render the variation in PDFs, note that both 
the super sensitive HpA and more robust, less sensitive 
V{p), for Dq, are about double their values for D5, even 
though Z?5 will yield considerably noisier PDF estimates. 
Similarly, the TDMI metrics for PDF variation also ren- 
der population diversity; SI for Dq is more than twice SI 
for D^. However, 61 for Dq has a slightly more compli- 
cated interpretation. In particular, while SI represents 
the difference between the population and the individ- 
ual TDMI, there is likely a non-trivial component of / 
that is a function of sample size. Thus, SI is not purely 
the difference between the individual and the population 
TDMI for unlimited data as it was for D3. Nevertheless, 
because / 3> Be{Dq), and SI ^ Be{Dq) we know that / 
has components of both individual and population scale 
TDMI. In fact, considering \i{D^)~ i{DQ)\ « 0.41 versus 
\I{D^) — I{D'q)\ k, 0.44, one can see that for this case, 
the TDMI whose source is in the population dominates; 
presumably if the supports for Dq were nearly overlap- 
ping instead of disjoint, \I{Dc^) — I{Dq)\ would be much 
closer to zero. While it is unusual to be able to compare 
identical, stationary systems with differing supports, this 
analysis does suggest that calculating / for the raw data 
and for the data with normalized supports may be use- 
ful for determining the proportion of / that is due the 
diversity of the supports. 
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2. Non-TDMI-based analysis of the simulated data 



Base cases: testing the non-TDMI metrics on 
individuals — Begin by considering Di and D2, both of 
which represent only a single individual. Both cases are 
well defined in p (c.f., Fig. [s]), and have supports whose 
lengths, \S\, and boundaries, Smin, Smax, are well re- 



solved and within their expected ranges (c.f., table IV). 

Support dependent, graph independent 
analysis — To see how variations in the supports 
are rendered, consider the contrast between and Dq, 
whose only difference is in the location of the supports. 
Focusing on Z?e, variation in the support shows up in 
the heuristic metrics Smin, Smax, \S\, and especially in 



the variance of s„ 



and 



Graph dependent, support independent 
analysis — Data set three, the quadratic-Gauss ag- 
gregated data set, has homogeneity in support in all 
support-based metrics as can be seen in tables IV as 
expected. In contrast, both of the heuristic metrics 
designed to detect variation in PDFs registered as 
non-zero, meaning they detected variation in the PDFs. 
Moreover, the /i-like diagnostic, Hua was more sensitive 
than the variance based metric, Vg{p), as expected. 

Support dependent graph-based analysis — None 
of the examples mix graph and support effects simulta- 
neously by design. 



3. Quantifying small sample-size effects 

To form a baseline of small sample size effects for both 
real data applications and the support-based effects, we 
focus on comparing and constraining results for Z?4 and 
D5, the quadratic map data sets with 50 sets of 20 points, 
and 10 sets of 100 points. 

Small sample size effects on non-TDMI-based 
support analysis metrics — The heuristic metrics of 
support diversity show homogeneity in support. How- 
ever, it is important to note that the invariant density 
of the quadratic map has most of its mass at the end 
points, and thus may represent the best case scenario for 
support based metrics on small data sets. 

Small sample size effects on TDMI-based sup- 
port analysis metrics — The TDMI based metrics of 
support diversity show homogeneity of support, although 
the individual-wise random perturbation for the random 
case [BiBp) is rather high, especially for the 20 point 
data sets, as one might expect. However, we hypothesize 
that the primary reason why Bjj^p is so high for the 20 
point data sets is that, upon randomly permuting any 
data set, the average r will be the length of the data set 
over 3, in this case, ^ < 7. Thus, for very short data 
sets, it can be difficult to approximate the estimator bias 
using only the random permutation method |I9j. 



Small sample size effects on non-TDMI-based 
graph analysis metrics — In contrast to the support- 
based effects, the heuristic-based PDF variability metrics 
register substantial diversity among the PDFs and 
-D5, effects that are entirely a function of small sample 
sizes. These results are not surprising given that there 
will be great variance in the PDF estimate of a quadratic 
time-series with only 20 points. 

Small sample size effects on TDMI-based graph 
analysis metrics — The small sample size situation 
highlights both the difference between / and / and also 
displays the motivation for why one would want to esti- 
mate /. The average based TDMI results for both D4 
and I?5 do not approximate the 1000 point analogs; and 
moreover, the addition of more sets of data with simi- 
lar lengths will not help the I to converge to the higher 
point analog but rather decrease the variance in the mean 
/ value. Thus, the desired meaning of / is, in a sense, a 
precision/accuracy type problem; adding more 20 point 
data sets will make the estimate of / more precise, but 
not necessarily more accurate. That said, accuracy is al- 
ways defined relative to a target; there is likely less TDMI 
in the 20 point data set because there is considerably less 
time-based information in a 20 or 100 point data set than 
in a 1000 point data set. Therefore, while adding more 
data sets will not aid in convergence to the infinite point 
analog, the infinite point analog may not be right target 
to be aiming for with 20 point data sets. In contrast, 
the aggregated data sets produce a TDMI equivalent to 
the 1000 point analog, thus inducing a SI. Moreover, 
adding points to the aggregated data set will help with 
convergence to /(r = 1) for infinitely long data strings. 

Interpreting 61 when individual elements have 
few pairs of points — The existence of SI for D4 and 

introduces a form of divergence from /(r = 1,A^ = 
00) that is not quite a bias (either estimator or non- 
estimator); the "true" amount of information in a data 
string of length 20 is fundamentally different from the 
"true" amount of information in a data string of length 
N = 00 — thus SI can also exist due to finite sample 
size effects. Or, said more quantitatively, /, even for an 
unlimited collection of 100 point data strings, will never 
be within estimator bias or any other kind of bias, of 
/(r = 1, TV = 00) because /(r = 1, iV = cx)) - 0.72 while 
I{t = l,iV = 20) « 0.48 ± 0.1. What this means for / 
is that, unless the aggregated data sets are homogeneous 
enough in their time-dependent correlation structure, / 
will likely represent population distribution information, 
as / would represent the upper bound on time-correlation 
based information present in each data string. Often 
the composition of most real world data streams can be 
difficult to infer; and moreover, it can be a non-trivial 
problem to discern whether / or / most faithfully rep- 
resent a population or individual effects. For instance, 
in Ref. [23, the authors claim both the presence of 
time-correlation information and population-based time- 
correlation being simultaneously present. Usually a care- 
ful analysis of the population composition of the St bins 
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non-TDMI-based population diversity metrics 


Source 


H{x) 


Var(ni) 






\S\±V\s\ 


Hra 




Dl 








0.0001 


0.999 


0.9989 








D2 








0.0002 


0.9998 


0.9997 








D3 








0.0002 ± 0.0003 


0.9989 ±0.0015 


0.9987 ±0.0018 


0.16 


0.09 


D4 








0.02 ±0.02 


0.996 ±0.006 


0.98 ±0.03 


0.9 


0.39 


D5 








0.001 ±0.002 


0.9997 ±0.0006 


0.998 ± 0.003 


0.37 


0.13 


D6 








5.5 ±3 


6.5 ±3 


0.997 ±0.004 


0.68 


0.32 



TABLE IV: Heuristic homogeneity metrics for the simulated data sets one though six. 



will help rectify this difficulty. 



B. Real data examples: glucose values for 100 
densely sampled individuals versus 20, 000 random 
individuals 

We now move on to applying the insights and tech- 
niques of the previous sections to real data. In particu- 
lar, wc will consider two data sets that contain different 
populations of patients from the CUMC data repository. 
More specifically, the data sets include: 

Data set 7 a collection of the 100 patients with the most 
glucose measurements in the database, ranging from 
4000 to ~ 1500 measurements per patient; 

Data set 8 a collection of 20, 000 random patients with 
at least 2 glucose measurements from among the 800, 000 
patients with glucose values. 

To visualize these populations, consider Fig. |4]where the 
normalized PDFs for each individual for each popula- 
tion and the PDF of the overall populations are plotted. 
While the population-wide PDFs, shown in Fig. 4(c) are 



not wildly different, the relative diversity within the two 
populations, as shown in Figs. 4(a) and |4(b)"| is dramatic. 
The motivation for choosing ZJy is that, for this set, be- 
cause each patient has at least 1000 lab values, both / 
and / are calculable. Moreover, the authors hypothesize 
that patients with so many glucose values are more likely 
to represent a more homogeneous population compared 
with the population at large. Given the makeup of Dy, 
Dg represents not only a contrast to Dj in that Dg is a 
snapshot of the entire population, but Z?g also represents 
a pathologically difficult situation data-wise — very few 
patients have more than 100 glucose values, and the set 
of possible causes for the existence of a glucose measure- 
ment is extremely large (or broad). Thus, not only will / 
be difficult to calculate for D% (most patients won't have 
enough data to generate a PDF estimate), but there is 
likely tremendous and differing diversity amongst the pa- 
tients actually included in the estimates of / and /. 



Finally, note that in contrast to the previous analysis 
of simulated data, we will present the TDMI results first, 
followed by an analysis using the non-TDMI metrics to 
verify the TDMI results. The point of this ordering is to 
demonstrate the TDMI infrastructure without hindsight 
knowledge. 



1. TDMI-based analysis for data set 7, the well measured 
population 



Analysis of the 6t = 6 hrs time separation using 
the algorithm in Fig. [2] — First, considering table |Vj 
note that for Dj with a 6t = 6hrs, we are able to esti- 
mate /, and thus SI because Nmini^hrs) > 100. Next, 
note that SI{6hrs) is considerably above Bif{p(6hrs), 
meaning that the population is on the time-scale of 6 
hours is heterogeneous. Moreover, both I{6hrs) and 
I(6hrs) are greater than zero, meaning that there is 
TDMI present in individuals and the aggregated popula- 
tion. To determine the nature of heterogeneity, further 
consider the support-based metric; T-LgiQhrs) ~ 1 points 
to the population having uniformity in supports or ranges 
{Bfip(6hrs) ~ Bifip{6hrs) which corroborates this con- 
clusion). Finally, the entire population is reasonably rep- 
resented for 6t — 6hrs as confirmed by the fact that 
N^in{6hrs) ^ 500 and He{6hrs) > 0. Thus, the con- 
cluding interpretation is as follows: the population is het- 
erogeneous on the St — 6hrs time scale; the heterogeneity 
in the population is in the graphs not the supports (or 
the normalizations; there is diverse but present temporal 
correlation among the population (i.e., the TDMI is not 
due to the population aggregation, but exists because of 
the individuals); and the entire population is well repre- 
sented in the TDMI-based quantities. 

Analysis of the St = 24 hrs time separation us- 
ing the algorithm in Fig. [2] — First, considering table 
\VJ\ note that for D-^ with a St = 24hrs, we are able to 
estimate /, and thus SI because Nmin{24:hrs) > 100. 
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TDMI-based quantities for the 5t = 6 hrs time separation 


Source 


I 


I 


51 


BpRP 


BiRP 


BpRP 


Bpp 


Hs 


He 


min 




0.64 ±0.03 


0.22 


0.42 ±0.03 


0.02 ±0.01 


0.02 ±0.005 


0.001 ±0.0005 


~ BlRp 


1 ± 0.0005 


0.31 


470 


Ds 


0.29 ±0.16 


0.38 


0.09 ±0.37 


0.2 ±0.2 


0.08 ±0.005 


0.006 ±0.0005 


~ BlRp 


1±0.02 


0.003 


1 



TABLE V: TDMI results and homogeneity metrics for the real patient data sets seven and eight; note all St times are in hours. 



TDMI-based quantities for the St — 24 hrs time separation 


Source 


I 


/ 


SI 


BpRP 


BiRp 


BpRP 


Brp 


Us 


He 


min 


Dr 


0.093 ±0.06 


0.077 


0.016 ±0.06 


0.02 ±0.01 


0.02 ± 0.005 


0.001 ± 0.0005 


~ BiRP 


0.99 ±0.01 


0.33 


479 


Ds 


0.21 ±0.15 


0.17 


0.04 ±0.15 


0.3 ±0.2 


0.07 ±0.01 


0.005 ± 0.001 


~ BiRP 


0.97 ±0.001 


0.005 


1 



TABLE VL TDMI results and homogeneity metrics for the real patient data sets seven and eight; note all St times are in hours. 



time independent TDMI-based quantities 


Source 


h 


h 




1.39 ±0.07 


2.12 


Ds 


0.8 ±0.22 


2.05 



TABLE VII: Time independent TDMI results for the real pa- 
tient data sets seven and eight.. 



Next, note that 5I{2Ahrs) is within the error bars of zero 
(e.g., below Bjfip{2Ahrs)), meaning that the population 
is on the time-scale of 24 hours is homogeneous. More- 
over, both I{24hrs) and I(2Ahrs) are greater than zero, 
meaning that there is TDMI present in individuals and 
the aggregated population. To determine the nature of 
heterogeneity, further consider the support-based metric; 
Hs(24/irs) ^ 1 points to the population having unifor- 
mity in supports or ranges {Bj^p{24hrs) ~ Bipp{2Ahrs) 
which corroborates this conclusion). Finally, the en- 
tire population is reasonably represented for St = 24:hrs 
as confirmed by the fact that Nmin(24:hrs) 500 and 
HQ{24hrs) ^ 0. Thus, the concluding interpretation 
is as follows: the population is homogeneous on the 
6t = 24hrs time scale; there is present temporal corre- 
lation among the population (i.e., the TDMI is not due 
to the population aggregation, but exists because of the 
individuals) ; and the entire population is well represented 
in the TDMI-based quantities. 



Analysis independent of time — Considering the 
entropy calculations in table |VII[ Dj renders some het- 
erogeneity because the difference between h and h is non- 
zero. Nevertheless, as we will see for Ds, an entropy dif- 
ference of 0.73, which is about half the magnitude of h, 
would argue that the static information theoretic inter- 
pretation of the population is of relative homogeneity. 



Sample size issues — There were no sample size is- 
sues with respect to either St time separations studied; 
in both cases, Nmin was well over 100, and thus all PDFs 
and their respective biases could be accurately estimated. 
In fact, careful analysis of the population make-up in each 
St between 6hrs and 56hrs revealed that the proportion- 
ally of each individual remained relatively constant. Fi- 
nally Fig. [6j where both the TDMI estimated using both 
KDE and histogram estimation schemes are shown, con- 
firms the lack of any small sample size effects because 
both estimation schemes are essentially equal. 



non-TDMI-based analysis for data set 7, the well 
measured population 



Non-TDMI support-based analysis — To verify 
the TDMI-based results, begin by observing that heuris- 
tic metric that quantifies variation in the supports, 
H{X) « 1, which is considered small. Thus, while there 
is some diversity among how the patients were measured, 
variation how patients are measured is small. This claim 
is also justified by the fact that the variance in the num- 
ber of points contributed, per patient, to the St = 6hrs 
bin, Var{ni), is small. Finally, the variance in Smin, Smax 
and |5| is small compared to the respective values (c.f., 
5(a) ). Because these are time-independent measures 



Fig. 

of the support, and because adding the temporal aspect 
of the analysis only makes the data set smaller, it is likely 
that the TDMI analysis of the homogeneity of support 
are correct. 

Non-TDMI graph-based analysis — The most sen- 
sitive PDF variation metric, Hpj^ points to a relatively 
diverse population, while the less sensitive PDF varia- 
tion metric Vg{p), based on the standard deviation of 
the distribution of PDFs, points to a relatively homo- 
geneous, yet not totally homogeneous population. Fig- 
ure [5] confirms this analysis visually. The maxima minus 
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non-TDMI-based analysis metrics 


Source 


H{x) 


Var(ni) 






1S|±V|S| 


Hra 


VdP) 




1.042 


463.7 


29.7 ± 10.7 


445.0 ± 58.8 


415.4 ±62.7 


0.898 


0.432 


Da 


30 


55 


84 ±35 


150 ± 122 


66 ± 125 


1 


0.90 



TABLE VIII: Heuristic homogeneity metrics for the real patient data sets seven and eight. 



the minima, which, when integrated is essentially Hjia, 
shown in Fig. |5(b)[ can be seen to be relatively large, 
thus making Hj^ji render diversity. In contrast, the vari- 



This interpretation will be substantiated further in sec- 



is 



ance in the graphs of the PDFs, shown in Fig 
seen as relatively small for D^, thus making Vg{p) render 
relative homogeneity. It is important to note, however, 
that Vg(j)), which is independent of time, does not de- 
tail the fact that the population has diverse predictive 
information for time periods less than 6 hours; this is an 
important distinction to make as it implies that predic- 
tion can vary with time despite the overall distribution 
of physiological variables. Finally, both the TDMI and 
the heuristic analysis conclude that the population is ho- 
mogeneous in supports and in the long term (i.e., inde- 
pendent of time), the population is homogeneous; this is 
because 61 ^ for 6t > 12 hrs and Vg{p) is small. 



3. TDMI-based analysis for data set 8, the random (less 
well measured) population 

Analysis of the 5t = Q hrs time separation us- 
ing the algorithm in Fig. [2] — First, considering table 
[V| note that for with a. 5t = 6hrs, we are not re- 
ally able estimate I{6hrs) because Nmin{Qhrs) = 1. To 
interpret I{6hrs), we consider the support-based met- 
ric; 'Hsi'ohrs) ^ 1 which points to the population, which 
was filtered and has time points separated by 6 hours, 
having uniformity in supports or ranges (Bf(p{6hrs) ~ 
BiRp{6hrs) which corroborates this conclusion). To give 
intuition to the graph-based variation, consider Vg{p) 
(table VIII), which implies a somewhat diverse popula- 



tion. Moreover, Vg{p) for Dg, is twice that of Dt, implying 
that the population in Dg is more diverse than that of Dy. 
Moving beyond the algorithm shown in Fig. [2| we did 
estimate liQhrs) and thus, 61{6hrs), only including indi- 
viduals with enough points to estimate /. Based on this 
restricted version of 61{6hrs), the population appears 
to be homogeneous. Nevertheless, both the restricted 
I{6hrs) and I{6hrs) are greater than zero, meaning that 
there is TDMI present in individuals and the aggre- 
gated population. This means that there is an apparent 
contradiction; the restricted 61{6hrs) implies a popula- 
tion that is somewhat homogeneous/heterogeneous while 
V^(j)) implies a heterogeneous population. This contra- 
diction is resolved by recalling that V^{p) is calculated 
on the entire, non-filtered population and is independent 
of time and will overestimate graphic diversity, while 
SI is overly restricted and will underestimate diversity. 



tions |IXB 5 1 and IXB 6[ Finally, the overall population is 
poorly represented for 6t — 6hrs as confirmed by the fact 
that Nmini^hrs) — 1 and HQ{6hrs) « 0. In fact, for Dg, 
we know that 63% of the patients (12, 763) have no points 
in the 6t = 6hrs bin, and only 12% (2, 400) of the pa- 
tients have ten or more points in the St = 6hrs bin. Thus, 
the concluding interpretation is as follows: the popula- 
tion is homogeneous on the St = 6hrs time scale up to 
what is resolvable by 61{6hrs); the represented popula- 
tion has relatively uniform supports; there is diverse but 
present temporal correlation among the population (i.e., 
the TDMI is not due to the population aggregation, but 
exists because of the individuals) ; the population has di- 
versity relative to their time-independent graphs, but this 
graph diversity may not reflect the graph diversity of the 
represented population (i.e., the population used to esti- 
mate the TDMI-based quantities) ; the overall population 
of patients is poorly represented in the TDMI-based di- 
agnostics; and finally the overall population of 20, 000 pa- 
tients is diverse, but the patients that have enough data 
to estimate the TDMI on time-scales of St < 48hrs (i.e., 
the represented population), which represents a strongly 
filtered subpopulation, is relatively homogeneous in pre- 
dictive information regardless of St. 

Analysis of the St = 24 hrs time separation using 
the algorithm in Fig. [2] — Considering table VI (and 
later, Fig. |6(b)[ ), the analysis of the TDMI diagnostics 
for St — 24hrs is essentially identical to St = 6hrs case. 
Even representative population for both the St = 6 and 
24hrs bins is essentially identical down to the individual 
proportional contributions to the aggregated data set. 
Thus, the key observation here is the difference between 
and ^7 registered heterogeneity at St = 6hrs and 
homogeneity at St = 24hrs whereas Ds does not render 
a St dependence in the TDMI-based diagnostics. 

Analysis independent of time — Considering the 



entropy calculations in table VII renders heterogene- 
ity because the difference between h and h is non-zero. 
In particular, compared to the entropy differences for Dy, 
the Dg has an entropy difference of ^ 1.25, which is sub- 
stantially larger in magnitude than h. Thus the static 
information theoretic interpretation of the population in 
Dg, which includes all patients (there is not filtering ef- 
fect), is of heterogeneity. 

Sample size issues — There are three sample size is- 
sues present in the TDMI analysis of Dg, the poor rep- 
resentation of the overall population, the inability to es- 
timate / for every representative member of the pop- 
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(a)Individual PDF estimates for the 
100 patients with the largest record 



(a) Comparisons of support minima, 
maxima, and length for the two 
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(c)Aggregated population PDF 
comparison 

FIG. 4: PDFs of glucose measurements for individuals within 
a population and for a population for two data sets, the 100 
patients with the largest records and 20, 000 random patients 



(c) Comparisons of the standard 
deviation of the PDF graphs for the 
two populations 

FIG. 5: Comparisons of the supports, and PDF graph vari- 
ations for two data sets, the 100 patients with the largest 
records and 5000 random patients 
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ulation, and the overall small sample size and band- 
width/normalization issues. The first issue implies that 
the probability mass used to estimate the PDFs comes 
from a very small subset of the population; e.g., only 12% 
of the population has 10 or more points in the St — Qhrs 
bin. Thus, the restricted (i.e., filtered) population is 
likely substantially more homogeneous than the overall 
population, and the TDMI analysis cannot be said to 
represent the overall population. Relative to the second 
issue, since Nmin = 1 (for both 5t = Q and 24 hrs), I{5t) 
is representative of a smaller population than I (St). Fi- 
nally the third issue, small sample size effects, can be 
seen in the large difference (about a factor of 2) between 
the KDE and histogram estimator based TDMI values 
seen in Fig. |6] 



4- non-TDMI-based analysis for data set 8, the random 
(less well measured) population 

Non-TDMI support-based analysis — Begin by 
noticing that there is considerable diversity in how the 
20, 000 patients are measured, as can be seen in H(X) w 
30, which is 30 times larger H(X) for Dy. Consider- 
ing this in conjunction with Var(ni) ~ 50 for Dg, which 
is much smaller than Var(ni) for Dy, implies that very 
few of the patients have many points. Said differently, 
the reason why Var(ni) is relatively small compared to 
H(X) is that Ui is bounded from below by and is never 
very large for any member of Dg- That this is the fact is 
reflected in variance in Smim Smax a-nd \S\, which is large 
(on the order of, or greater than) the values of Smim Smax 
and |5| respectively (c.f., Figjs]). Heuristically this effect 
can be seen by observing the range of values seen in Fig. 
|4(b) I versus Fig. |4(a)| — the population of 20, 000 yields 
a range of glucose values roughly five times that of Dj. 

Non-TDMI graph-based analysis — The most sen- 
sitive PDF variation metric, Hua points to a relatively 
diverse population. In contrast to the results for D^, the 
less sensitive PDF variation metric Vg(p), also points to 
a heterogeneous population; in particular, Vg(p) is just 
about twice the Vg(p) for Dy. 



dependent the structure of meal times [55] . This is scien- 
tifically interesting because it is a signal that can be used 
to test physiological models, it can be used to distinguish 
populations, it implies that outside of very local time 
windows, measurements separated by 24 are more infor- 
mative than measurements separated by fewer hours, and 
finally, the diurnal peaks confirm the presence of diurnal 
cycles in humans that are believed to exist. 

Second, relative to Z^y, the population appears to be 
heterogeneous on time scales of 6 hours and less, and 
homogeneous on time scales longer than 6 hours. This 
can be seen in Fig. |6(a) where SI(6hrs) is relatively large 
and drops to zero by 5t = 12 hrs. This is an interesting 
result that we are still working to understand. 

Third, by comparing the results for and Dg,, we 
can observe a difference in the degree of homogeneity 
amongst the population. In particular combining the 
facts that the error bars for / are large for compared 
to Dt, si is independent of St for Ds, SI for Ds is much 
larger than for Dy, and the broad qualitative TDMI sig- 
nal (i.e., the diurnal peaks) is the same for both Z?y and 
Ds, it seems clear that both data sets have somewhat 
homogeneous populations (i.e., homogeneous enough to 
resolve a similar signal) , but Dy is considerably more ho- 
mogeneous than Dg. 

Fourth, considering Fig. |6(b)[ it is clear that the aggre- 
gate TDMI resolves the diurnal peaks considerably better 
than the average TDMI. This is confirms the usefulness 
of the aggregate TDMI in the context of a complex, di- 
versely measured population. 

And fifth, the small sample size effects are clearly ev- 
ident when comparing the difference between the his- 
togram and KDE estimates of the TDMI between Figs. 
6(a) and |6(b)] In particular, the two different estimates 



Analysis of the TDMI under variation of St 



for the aggregate TDMI on Dj are essentially identical, 
while the aggregated TDMI estimates on Z?g differ in a 
nontrivial way (by more than a factor of two). The av- 
erage TDMI calculations display an even stronger effect. 
Finally, the error bars for Dg are about ten times the 
magnitude of the error bars for Dy. 

The point is, the time evolution of the TDMI is both 
scientifically valuable in that it leads to insights not oth- 
erwise observed and interpretable in the context of a time 
dependent, complex, diversely measured population us- 
ing the infrastructure presented in this paper. 



A central motivation for using the TDMI is to observe 
how nonlinear correlation evolves in time; however, in the 
context of a diversely measured population, one must 
take care to ensure the TDMI signal represents a rela- 
tively constant population. Relative to Dy and Z3g, we 
know that, for St between 6 and at least 56 hours, the 
representative population is roughly constant. Figure [6] 
details the temporal evolution of the TDMI, and with it, 
exhibits five notable features. 

First, both data sets display diurnal peaks in pre- 
dictability; a full explanation of these peaks, which is 



6. Independent analysis of the population composition of 
Dy and Ds 

Based on the time-based information theoretic analy- 
sis we have reached the following population-composition 
hypotheses: data set seven represents a homogeneous 
population for St > 6hrs and is heterogeneous for St < 
6hrs; the subpopulation of data set eight used to es- 
timate I(St < 48) is relatively homogeneous, but less 
homogeneous than data set seven; overall, data set 8 is 
heterogeneous. However, because these populations are 
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(a)TDMI for I and I with St bins of six hours 
for a period 60 hours for the 100 patients with 
the most glucose values using both the 
histogram and KDE PDF estimation techniques. 
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(b)TDMI for I and 7 with 5t bins of six hours 
for a period 72 hours for the 20, 000 randomly 
selected patients using both the histogram and 
KDE PDF estimation techniques. 



FIG. 6: The TDMI for both I and I with St bins of six hours 
for a period of a few days. With respect to Fig 
the following: for St < 6hrs, SI > 0; for St > 6hrs 



6(a; 



SI 



note 
^ 0; 



the KDE and histogram estimates are extremely similar; the 
diurnal (daily) periodic variation in correlation of gluc ose is 
clearly evident in both / and /. With respect to Fig. 6(b) 
note the following: for all St SI is consistent and likely zero 
within bias; the KDE and histogram estimates differ greatly, 
implying the presence of small sample size effects in the aver- 
age TDMI calculation; the diurnal (daily) periodic variation 
in correlation of glucose is clearly evident in both I and I in 
all but the KDE estimated TDMI average. 



real patients from a hospital, we can also examine other 
sources of information regarding the qualitative types 
these populations represent. Specifically, we can consider 
the billing codes, which can act as a proxy for popula- 
tion composition, assigned to the patients in the vari- 
ous populations. It is important to note that the billing 
codes are largely independent of the specific lab values, 
and thus, can be seen as an outside test of the validity of 
the TDMI analysis. 

We consider the fraction of patients with the two most 
frequent billing codes for three data sets, Dy, D^, and 
the subset of Ds used to estimate the TDMI-based diag- 
nostics, D'g (members of the Dg subpopulation have at 
least 10 glucose measurements separated by six hours or 
less). Note that a patient is counted for having an billing 
code if it occurs only once. There are two features of that 
are important to pay attention to: (i) the overall fraction 
of patients that have a given billing code, and (ii), the 
drop off between the fraction of patients with the most 
and second most common billing codes. For Df, 75% of 
the patients are covered by a single billing code and the 
drop between the most and second most common billing 
codes is around 5% — thus 70 -I- % of these patients likely 
have relatively similar afflictions. In contrast, the most 
frequently seen billing code in £)g only covers 25% of the 
population, followed by a 10 point drop off. In constrast, 
at least 50% of D'^ is covered by a single billing code, 
while the second most common billing code only cov- 
ers only a quarter of the population — a 25 point drop. 
This implies more homogeneity than Dg, but less than Dj. 
Broadly speaking, the billing code analysis corroborates 
the conclusions drawn from the time-based information 
theoretic analysis in the previous section. Nevertheless, 
the billing code analysis, being static, does not reveal the 
heterogeneity observed in at 5t = 6hrs. 



X. SUMMARY 

Note, a explicit prescription for interpreting / 
for a fixed time separation St for a population can 
be found in Fig. [2] within section VIII Moreover, 
an algorithmic portrayal can be found in appendix |A 3[ 

Results of the interpretative framev^rork relative 
to real data. The methods in this paper were shown 
to work for both a well understood computer-generated 
data set and for a pathologically diverse real data set. 
Thus, given a population of time-series that are: non- 
uniformly measured in time, of diverse lengths, from sta- 
tistically diverse sources, nonstationary, and patholog- 
ically sparse, our methods will likely still yield inter- 
pretable results. The entropy for all populations regis- 
tered the populations as diverse. Nevertheless, the TDMI 
produced a more nuanced picture. In particular, for one 
set of patients, the TDMI calculation implied that a set 
of patients have differing predictive information up to 6 
hours, and are homogeneous in correlation afterwards. In 
contrast, the same calculation on a heavily filtered gen- 
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eral population (the population that had frequent data 
measurements), yielded a population that seemed ho- 
mogeneous with respect to time-dependent correlation. 
While these two sets of patients, according to their billing 
codes were similar, they differed in some key features. 
Thus, while it is likely that these populations are differ- 
ent, a full explanation, which requires more clinical study, 
is beyond the scope of this paper. Nevertheless, the 
TDMI analysis yielded results that were understandable, 
given this pathologically difficult population of data. 

How our method addresses nonstationarity. At 

various points in this paper we have alluded to how non- 
stationarity is addressed within our framework. To be 
more explicit, consider three cases: (i) a single nonsta- 
tionary source, (ii) multiple different stationary sources, 
and (iii) multiple different nonstationary sources. Rela- 
tive to case (i) , because there is no real sense of popula- 
tion average, 51 « 0, Birp = BpRp and ~ Be — 
thus there will be no distinction between stationarity and 
non station arity. Case (ii) is the case we handled in sec- 
tion |IXA and does not need explanation. And case (iii) 



will behave identically to case (ii); nonstationary will be 
difficult to detect, but multiple different statistical states 
will be detectable. While it might be too much to ask to 
be able to distinguish nonstationarity amongst a popula- 
tion from a population with multiple stationary sources, 
we can detect nonstationarity within an individual, given 
enough data points. In particular, relative to case (i), 
the reason why all the diagnostics fail to detect multiple 
statistical states is that there is no concept of averaging 
over a population. To address this issue, one only needs 
to partition the single time series into multiple pieces (of 
sufficient length), and then apply the standard TDMI 
analysis from this paper to the new "population" of time 
series. Said differently, the to detect nonstationarity in a 
single source, one only needs to treat the single source as 
multiple sources and apply our machinery; if it appears 
that there are multiple sources, then you know that the 
single source has multiple statistical states, and is thus 
nonstationary. 

Comments regarding the connection between 
the supports and the normaUzations of the distri- 
butions. In a sense, all support-based variation amongst 
the population could be eliminated by normalizing all in- 
dividuals to some standard support (or to a distribution 
with mean zero and variance one) . Wc did not implement 
this because sometimes the normalization of the support 
matters with respect to the composition of the popula- 
tion, and we wanted to allow for the TDMI infrastructure 
to capture this type of dependence. Relative to the ex- 
ample in this paper, having glucose oscillate around 500 
means the patient is very sick, whereas glucose oscilla- 
tion around 100 means the patient is likely healthy (at 
least from a blood glucose perspective) — we wanted to 
be able to capture this type of heterogeneity. That said, 
if one begins with a normalized population and performs 
the TDMI analysis, any 51 must exist because of vari- 
ation in the graphs of the PDFs. However, if one has 



enough points per patient to estimate J, one knows this 
anyway upon calculating Birp and Brrp] when there 
are not enough points to estimate / for every individual, 
then deducing temporal, graph-based variation is diffi- 
cult. 

Future directions regarding the use of this tech- 
nique. One of the sources of motivation for performing 
this calculation is based on the idea of stratifying or clus- 
tering populations of individuals by their predictive infor- 
mation. Based on the TDMI infrastructure here, we have 
identified at least 3 different subpopulations based on 
their predictive information structure. Thus future com- 
putational problems will involve developing and testing a 
more automated form of this interpretive structure that 
can be used for generating hypothesized sub-categories 
of individuals and eventually an infrastructure that can 
be integrated with classification and clustering schemes. 

Some remaining statistical problems. In this 
work we attempted to outline and show, mathematically, 
how to interpret the TDMI and information entropy for 
aggregated populations. Nevertheless, there are many 
details that are remain. In particular, a partial list might 
include full rigorous proofs regarding: the technical con- 
ditions under which our claims (i.e., > if an only if 
ei > for some i) apply; the convergence properties of 
various quantities we propose (i.e., 51, Hs- etc); and the 
full relationships between what the information entropy 
and TDMI can imply about one another. The goal of 
this work was to propose a practically workable frame- 
work calculating the TDMI for complex populations of 
time series. However, this work leaves many interesting, 
more abstract questions remaining. 
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Appendix A: Analysis of aggregation order 
1. Detailed average TDMI calculation 

Begin by recalling the definition of the average TDMI: 



1^^^ p(^i(j),A,(j-r)) 
p{X^{]))p[X^{] -t)) 

i{T)dX{t)dX{t + T). 
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Next, recall that for the average TDMI, we have PDFs 
defined entirely with respect to the abstract support, S. 
In this situation, we define the z*'* PDF relative to the 
"average" PDF, pi, by: 

^ Pi{S) ~ e,{S) (A2) 

where (S) is distance between the graphs of pi and pi 
at a given value in S. Next, for convenience, define the 
following: p{X,{j),X,{j - r)) = p{j, r), p{X,{j)) = p{j), 
p{Xt{j -t)) = p(t), e^{S) = Pj(j,t) = pi(j,r) - e^, 
= PiO') ^ e^, and pi(r) = pi(t) - e^. With this 
notation, we can now re-write the integrand in Eq. |Af | 
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Next, factoring ^/■l'''^? x out of the summation term, one 
arrives at: 
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Multiplying and collecting terms under the sum, one ob- 
tains: 
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=p(r) -I- G(r) (A15) 



where G(t) can be shown to have the more digestible 
form: 
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2. Detailed aggregate TDMI calculation 

Begin by recalling the definition of the TDMI for an 
aggregate population: 

.(Ar-;A;)log( ^^^„^^;^^;i^ ).Ar-.A- 



(A17) 
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Next, recall that in this situation we first select an "av- 
erage" PDF relative to the abstract support S and then 
we define the z*'' PDF relative to this "average" PDF on 
the total support S, pi, by: 



Pi =pi(5') - e^{S) 



(A18) 



where £^(5*) is distance between the graphs of pi and Pi at 
a given value in S. Next, for convenience, define the fol- 
lowing: p{X^{j),X,{j - r)) = Pi{j,T), p{X^{j)) = p^{j), 
p{Xi{j - t)) p,{t), t^{S) = h, Pi{j,T) = pi(j,r) - 
Pi{j) = Pi(j) - ei> and pi{T) = pi{t) - e^, never forget- 
ting that all of these quantities depend on a particular 
value in the support, 5*. With this notation, we can now 



re- write the integrand in Eq. A17 in terms of only pi and 
e, arriving at: 



1 ^ 

^'^{PiU^'r) - ii) 



N 



(log( 



iw Eti(piO-) - Eti(piW - h)) 



(A19) 



)) 



(A20) 



27 



i = l t» 



Next, factoring r), pi{j), and pi{t) out of the nu- where G is given by: 
merator of the summation terms, one arrives at: 

=(^^y(i--^)) (A21) 

PlU,r) v^AT G'(t) =log . „N_i ^N-l. 

iog( ^ , — ^ ^ \ii - - %^)^ 

which, after collecting terms, becomes: ^JV-i ^ , / ■ n 

w . N \Pl{j)Pl{T) 

(log(r#^^) + log ^/"F"^ ^ 



Pi(i);>i(r)' (1 - Nitu))i^ - Ef=i iv^) 

(A24) 

Next, collecting the pi{j, r) log( pf(j)p\^(r) ) one gets: 

3. Pseudocode for interpreting the TDMI for a 

/. M / Pi{j,T) . . A, . population of time series 

o=p,{j,T)log{^-^j^) + G{T) (A25) 
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Algorithm 1 How to interpret the TDMI for a 
population of time series 

if there are enough points to estimate I (usually ~ 100 
pairs of points per representative individual are required) 
then 

estimate 51 and Hq 

if 51 > BiRp then 

the population is heterogeneous 

if Ks ~0 then 

supports (or ranges) are diverse or disjoint 

else if Hs ~ 1 then 

supports (or ranges) are uniform 

end if 

else if 51 < Bjrp then 

the population is homogeneous 

end if 

if ffe ~ then 

the population is well represented 
else if He ~ 1 then 

the portions of the population are overrepresented 
end if 

else if not enough pairs to estimate / then 

estimate /, Hs, and 
if ~ then 

supports (or ranges) are diverse or disjoint 

if there are enough pairs of points per patient to estimate 

a PDF for each patient at the specific 5t then 

Vg{p) (i.e., V{p) relative to the abstract supports) 

if Vg{p) ~ 1 then 

the population used to estimate / has graph-based hetero- 
geneity 

else if Vg{p) ~ then 

the population used to estimate I is graphically homoge- 
neous 
end if 

else if it is not possible to accurately estimate a PDF for 
each patient at the specific 5t then 

it is not possible to determine the contribution of the graph- 
based heterogeneity to the overall heterogeneity 
end if 

else if T-Ls ~ 1 then 

supports (or ranges) are uniform 

if Vs{p) - 1 then 

the population used to estimate / has graph-based hetero- 
geneity 

else if Vs(p) ~ then 

the population used to estimate I is homogeneous 
end if 
end if 

if i?e ~ then 

the population is well represented 
else if i/e ~ 1 then 

the portions of the population are overrepresented 
end if 
end if 

{NOTE: there are 10 possible sharp interpretations for both 
51 and /-only cases.} 

{All TDMI interpretations should include: 7-like quan- 
tities (e.g., 7, 61, etc), population diversity qualification 
(support- and graph-based contributions to diversity; if 
they are unknown, this should be specified), and the make- 
up of the population used to estimate the /-based quantities 
(e.g.. He.} 

{NOTE: even under the best circumstances, it may be dif- 
ficult to determine what proportion of the heterogeneity is 
due to support-based versus graph-based diversity.} 
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